AI agents that generate code face a fundamental problem: where do you safely run it? When an LLM produces a Python script to analyze your data or a shell command to scaffold a project, executing that code on your host machine is a security nightmare. One malicious or buggy output could wipe files, exfiltrate data, or consume all your resources.

This is why we're adding OpenSandbox to Better OpenClaw as a first-class service. OpenSandbox (Apache 2.0, by Alibaba) provides lifecycle-managed, containerized execution environments that your AI agents can use to safely run code, manipulate files, and even preview GUI applications—all within resource-limited, network-isolated Docker containers.

The Problem: Code Generation Without Execution

Today's self-hosted AI stacks have a gap. Your OpenClaw instance can run LLMs, search the web, automate workflows, and manage knowledge bases. But when a skill generates code, it has nowhere safe to execute it. The options are grim:

Run on the host — Dangerous. Generated code has full access to your VPS.
Don't run it at all — The agent shows code but can't verify it works.
Use a SaaS sandbox (E2B, CodeSandbox) — Adds external dependencies and per-minute costs that conflict with the self-hosted value proposition.

OpenSandbox eliminates this gap entirely by running as a Docker container alongside your existing stack.

How OpenSandbox Works

OpenSandbox runs a lightweight FastAPI control plane that manages sandbox containers through the Docker socket. Each sandbox is an ephemeral Docker container with an injected execution daemon (execd) that provides a uniform API for code execution, shell commands, and file operations.

OpenSandbox Architecture

OpenClaw Gateway (:18789) | +-- Skill: "Run this Python script" | | v v OpenSandbox Server (:8080) <-- lifecycle API | +-- POST /v1/sandboxes --> create container +-- Container: [base image + execd :44772] | +-- /code --> execute Python/JS/Go/Bash | +-- /command --> shell commands | +-- /files --> file operations | +-- Bridge network (isolated from host) +-- TTL auto-expiration (30min idle)

What You Can Do With It

Safe AI Code Execution

The primary use case. When your OpenClaw agent generates code, it creates an ephemeral sandbox, executes the code, captures the output, and returns the results—then cleans up automatically.

User: "Write a Python script to analyze my CSV file"
Agent: generates script
     → creates sandbox (opensandbox/code-interpreter:python)
     → uploads CSV, executes script
     → returns stdout/stderr + exit code
     → sandbox auto-terminates after 30min idle

Multi-Language Support

OpenSandbox supports Python 3.12, JavaScript/TypeScript (Node.js 22), Java 21, Go 1.24, and Bash out of the box. Each language runs in a pre-built image optimized for size and startup speed.

Desktop Preview with noVNC

This is where it gets interesting. OpenSandbox ships GUI-capable images that run a full XFCE desktop with noVNC, enabling browser-accessible live preview of agent work. Your agent can create a React app, start the dev server, and you watch it happen in real-time through an embedded iframe.

Available GUI images:

opensandbox/desktop:latest — Full XFCE desktop with noVNC (port 6080)
opensandbox/chrome:latest — Chromium + DevTools Protocol (port 9222)
opensandbox/vscode:latest — VS Code Web (code-server) for in-browser editing

Multi-Step Workflows

Sandboxes persist across multiple API calls (until idle timeout), enabling workflows like:

Create a sandbox
Upload project files
Install dependencies (npm install)
Run tests (npm test)
Download the results
Terminate the sandbox

Security Model

Running untrusted code demands defense-in-depth. OpenSandbox provides multiple layers:

Container isolation — Each sandbox is a separate Docker container with its own filesystem and network namespace
gVisor runtime — Sandboxes run under gVisor for kernel-level syscall filtering
Capability dropping — NET_ADMIN, SYS_ADMIN, SYS_PTRACE, MKNOD, NET_RAW, and SYS_RAWIO are all dropped
PID limits — Max 512 PIDs per sandbox (fork bomb protection)
Memory caps — 512MB default per sandbox
Network isolation — Bridge mode, no outbound access by default
No privilege escalation — no_new_privileges: true
API key authentication — 32-byte cryptographic key for the lifecycle API

Deploying with Better OpenClaw

OpenSandbox is available as an optional addon service. Add it to your stack with a single selection in the CLI wizard or API call:

# CLI
openclaw generate --services opensandbox,n8n,grafana

# API
POST /api/v1/generate
{
  "services": ["opensandbox", "n8n", "grafana"]
}

Better OpenClaw handles everything: Docker Compose generation with the Docker socket mount and config file, API key generation, reverse proxy route at /sandbox, health check polling, and pre-pulling the 8 required images across 3 priority tiers.

Resource Requirements

The OpenSandbox server itself is lightweight (~256MB RAM). Each sandbox adds ~512MB (configurable). The practical limits depend on your VPS:

VPS RAM	Max Concurrent Sandboxes
4 GB	1 sandbox
8 GB	3 sandboxes
16 GB	8 sandboxes
32 GB	20+ sandboxes

Pre-pulled images require ~8GB of disk space total. On constrained VPS plans, Better OpenClaw prioritizes essential images (server, execd, desktop, chrome) and defers optional ones.

Why Self-Hosted Beats SaaS Sandboxes

Services like E2B charge $0.05/min per sandbox. For a team running 100 sandboxes per day at 5 minutes each, that's $250/month—on top of your existing VPS costs. OpenSandbox runs on your hardware for free. The only cost is the VPS resources you're already paying for.

More importantly, your code and data never leave your infrastructure. No third-party sees your proprietary scripts, API keys, or datasets. This is especially critical for enterprises with data residency requirements.

What's Next

OpenSandbox integration is available starting in Better OpenClaw v1.0.26. The initial release includes the core code-sandbox skill with 8 actions (code execution, shell commands, file operations, desktop sandbox creation, and VNC preview). Future work includes Homespace live preview integration, Chrome DevTools protocol support, and sandbox resource monitoring in the dashboard.

If you're building AI agents that generate code, OpenSandbox gives them a safe place to run it—without leaving your infrastructure.

OpenSandbox: Secure AI Code Execution for Self-Hosted Agents

The Problem: Code Generation Without Execution

How OpenSandbox Works

OpenSandbox Architecture

What You Can Do With It

Safe AI Code Execution

Multi-Language Support

Desktop Preview with noVNC

Multi-Step Workflows

Security Model

Deploying with Better OpenClaw

Resource Requirements

Why Self-Hosted Beats SaaS Sandboxes

What's Next

Related Articles

How to Self-Host AI Agents with Docker Compose: A Complete Guide

What Are AI Skill Packs and Why They Matter for Orchestration

The Vector Database Wars: Qdrant vs. Milvus vs. ChromaDB

COMPANY

LEGAL