OpenSandbox: Secure AI Code Execution for Self-Hosted Agents
How OpenSandbox brings isolated, containerized code execution to your self-hosted AI stack. Run Python, JavaScript, Go, and Bash safely with resource limits, network isolation, and VNC desktop preview.
AI agents that generate code face a fundamental problem: where do you safely run it? When an LLM produces a Python script to analyze your data or a shell command to scaffold a project, executing that code on your host machine is a security nightmare. One malicious or buggy output could wipe files, exfiltrate data, or consume all your resources.
This is why we're adding OpenSandbox to Better OpenClaw as a first-class service. OpenSandbox (Apache 2.0, by Alibaba) provides lifecycle-managed, containerized execution environments that your AI agents can use to safely run code, manipulate files, and even preview GUI applications—all within resource-limited, network-isolated Docker containers.
The Problem: Code Generation Without Execution
Today's self-hosted AI stacks have a gap. Your OpenClaw instance can run LLMs, search the web, automate workflows, and manage knowledge bases. But when a skill generates code, it has nowhere safe to execute it. The options are grim:
- Run on the host — Dangerous. Generated code has full access to your VPS.
- Don't run it at all — The agent shows code but can't verify it works.
- Use a SaaS sandbox (E2B, CodeSandbox) — Adds external dependencies and per-minute costs that conflict with the self-hosted value proposition.
OpenSandbox eliminates this gap entirely by running as a Docker container alongside your existing stack.
How OpenSandbox Works
OpenSandbox runs a lightweight FastAPI control plane that manages sandbox containers through the Docker socket. Each sandbox is an ephemeral Docker container with an injected execution daemon (execd) that provides a uniform API for code execution, shell commands, and file operations.
OpenSandbox Architecture
What You Can Do With It
Safe AI Code Execution
The primary use case. When your OpenClaw agent generates code, it creates an ephemeral sandbox, executes the code, captures the output, and returns the results—then cleans up automatically.
User: "Write a Python script to analyze my CSV file"
Agent: generates script
→ creates sandbox (opensandbox/code-interpreter:python)
→ uploads CSV, executes script
→ returns stdout/stderr + exit code
→ sandbox auto-terminates after 30min idle
Multi-Language Support
OpenSandbox supports Python 3.12, JavaScript/TypeScript (Node.js 22), Java 21, Go 1.24, and Bash out of the box. Each language runs in a pre-built image optimized for size and startup speed.
Desktop Preview with noVNC
This is where it gets interesting. OpenSandbox ships GUI-capable images that run a full XFCE desktop with noVNC, enabling browser-accessible live preview of agent work. Your agent can create a React app, start the dev server, and you watch it happen in real-time through an embedded iframe.
Available GUI images:
opensandbox/desktop:latest— Full XFCE desktop with noVNC (port 6080)opensandbox/chrome:latest— Chromium + DevTools Protocol (port 9222)opensandbox/vscode:latest— VS Code Web (code-server) for in-browser editing
Multi-Step Workflows
Sandboxes persist across multiple API calls (until idle timeout), enabling workflows like:
- Create a sandbox
- Upload project files
- Install dependencies (
npm install) - Run tests (
npm test) - Download the results
- Terminate the sandbox
Security Model
Running untrusted code demands defense-in-depth. OpenSandbox provides multiple layers:
- Container isolation — Each sandbox is a separate Docker container with its own filesystem and network namespace
- gVisor runtime — Sandboxes run under gVisor for kernel-level syscall filtering
- Capability dropping — NET_ADMIN, SYS_ADMIN, SYS_PTRACE, MKNOD, NET_RAW, and SYS_RAWIO are all dropped
- PID limits — Max 512 PIDs per sandbox (fork bomb protection)
- Memory caps — 512MB default per sandbox
- Network isolation — Bridge mode, no outbound access by default
- No privilege escalation —
no_new_privileges: true - API key authentication — 32-byte cryptographic key for the lifecycle API
Deploying with Better OpenClaw
OpenSandbox is available as an optional addon service. Add it to your stack with a single selection in the CLI wizard or API call:
# CLI
openclaw generate --services opensandbox,n8n,grafana
# API
POST /api/v1/generate
{
"services": ["opensandbox", "n8n", "grafana"]
}
Better OpenClaw handles everything: Docker Compose generation with the Docker socket mount and config file, API key generation, reverse proxy route at /sandbox, health check polling, and pre-pulling the 8 required images across 3 priority tiers.
Resource Requirements
The OpenSandbox server itself is lightweight (~256MB RAM). Each sandbox adds ~512MB (configurable). The practical limits depend on your VPS:
| VPS RAM | Max Concurrent Sandboxes |
|---|---|
| 4 GB | 1 sandbox |
| 8 GB | 3 sandboxes |
| 16 GB | 8 sandboxes |
| 32 GB | 20+ sandboxes |
Pre-pulled images require ~8GB of disk space total. On constrained VPS plans, Better OpenClaw prioritizes essential images (server, execd, desktop, chrome) and defers optional ones.
Why Self-Hosted Beats SaaS Sandboxes
Services like E2B charge $0.05/min per sandbox. For a team running 100 sandboxes per day at 5 minutes each, that's $250/month—on top of your existing VPS costs. OpenSandbox runs on your hardware for free. The only cost is the VPS resources you're already paying for.
More importantly, your code and data never leave your infrastructure. No third-party sees your proprietary scripts, API keys, or datasets. This is especially critical for enterprises with data residency requirements.
What's Next
OpenSandbox integration is available starting in Better OpenClaw v1.0.26. The initial release includes the core code-sandbox skill with 8 actions (code execution, shell commands, file operations, desktop sandbox creation, and VNC preview). Future work includes Homespace live preview integration, Chrome DevTools protocol support, and sandbox resource monitoring in the dashboard.
If you're building AI agents that generate code, OpenSandbox gives them a safe place to run it—without leaving your infrastructure.