Skip to content
/

OpenSandbox

opensandbox · alibaba/OpenSandbox · ★ 11k · last commit 2026-05-26

Primitive shape 4 total
Skills 1 MCP tools 3
00

Summary

OpenSandbox (Alibaba) — Summary

OpenSandbox is an Alibaba-originated, CNCF-listed general-purpose sandbox platform for AI applications — providing multi-language SDKs (Python, Java/Kotlin, JavaScript/TypeScript, C#/.NET, Go), a unified REST protocol (Sandbox Protocol) for lifecycle and execution APIs, and pluggable runtimes supporting Docker and Kubernetes with optional secure container runtimes (gVisor, Kata Containers, Firecracker microVM). Its key architectural choice is separation between the sandbox protocol (the API contract), the sandbox server (FastAPI control plane), and the sandbox runtime (the actual container or VM) — allowing operators to plug in different isolation backends without changing client code. The osb CLI and opensandbox-mcp MCP server expose sandbox operations to both human operators and MCP-capable clients (Claude Code, Cursor). OpenSandbox is in the CNCF Landscape under "Scheduling & Orchestration," positioning it as cloud-native infrastructure rather than a developer tool.

Differs from seeds: OpenSandbox is categorically distinct from all 11 seeds — every seed operates at the LLM agent-loop layer while OpenSandbox is a sandbox control plane. The closest parallel within the batch is AgentTier (K8s-native sandbox lifecycle) and CubeSandbox (microVM sandbox), but OpenSandbox has the widest multi-language SDK surface (5 languages) and the most explicit "protocol specification" philosophy (defining sandbox lifecycle APIs as a standard others can implement). The opensandbox-mcp server parallels ccmemory's (seed) MCP pattern but exposes sandbox operations rather than memory operations.

01

Overview

OpenSandbox — Overview

Origin

Created by Alibaba's engineering team and listed in the CNCF Landscape under "Scheduling & Orchestration." The project targets enterprise-scale AI infrastructure requirements — multi-language SDKs reflect Alibaba's polyglot engineering culture (Java/Kotlin for Android, Go for cloud infra, Python for ML, TypeScript for frontend).

Philosophy

From the README:

"OpenSandbox is a general-purpose sandbox platform for AI applications, offering multi-language SDKs, unified sandbox APIs, and Docker/Kubernetes runtimes for scenarios like Coding Agents, GUI Agents, Agent Evaluation, AI Code Execution, and RL Training."

The key philosophical commitment is protocol-first design: the Sandbox Protocol defines the API contract for sandbox lifecycle and execution independently of any implementation. This allows different runtime backends (Docker today, Kubernetes, Firecracker tomorrow) to be substituted without changing client code.

Manifesto-style statements

From the CLAUDE.md / AGENTS.md:

"Think before coding: state assumptions, surface ambiguity, and ask or push back when the request has conflicting interpretations."

"Simplicity first: implement the smallest solution that satisfies the request; avoid speculative features, one-off abstractions, and unnecessary configurability."

"Surgical changes: touch only files and lines needed for the task, match local style, and do not refactor or delete unrelated pre-existing code."

(These reflect the CLAUDE.md operating instructions for contributors working on the OpenSandbox repo itself — not instructions for agents running in sandboxes.)

Use cases

From the README:

  • Coding Agents (e.g., Claude Code execution environments)
  • GUI Agents (browser automation, VNC, desktop environments)
  • Agent Evaluation (running agent tasks in isolated environments)
  • AI Code Execution (safe execution of LLM-generated code)
  • RL Training (parallel isolated environments for reinforcement learning)

CNCF membership

Listed in the CNCF Landscape under "Scheduling & Orchestration" — signals enterprise production readiness ambitions and governance maturity (OpenSSF Best Practices badge, OpenAPI specs, contributor governance documentation).

02

Architecture

OpenSandbox — Architecture

Sandbox primitive

Pluggable — the sandbox runtime is an implementation detail behind the Sandbox Protocol:

Runtime Isolation
Docker (default, production-ready) Container namespace
Kubernetes (production-ready) Pod-level isolation
gVisor Kernel-level (seccomp-heavy)
Kata Containers Hardware VM (kata-runtime)
Firecracker microVM Dedicated kernel, millisecond startup

Startup time: not benchmarked in README for standard runtimes. Firecracker (when configured) provides sub-second cold starts comparable to CubeSandbox.

Distribution

  • Python SDK: pip install opensandbox
  • MCP server: pip install opensandbox-mcp
  • CLI: pip install opensandbox-cli (or uv tool install opensandbox-cli)
  • Server: uvx opensandbox-server
  • Java/Kotlin: Gradle/Maven dependency
  • TypeScript: npm install @alibaba-group/opensandbox
  • C#/.NET: dotnet add package Alibaba.OpenSandbox
  • Go: go get github.com/alibaba/OpenSandbox/sdks/sandbox/go

Directory tree (repo)

OpenSandbox/
├── server/               # FastAPI control plane (Python)
├── components/
│   ├── execd/            # In-sandbox execution daemon
│   ├── egress/           # Per-sandbox network egress policy sidecar
│   └── ingress/          # Ingress gateway and endpoint routing
├── sdks/
│   ├── sandbox/          # Python, Java, TypeScript, C#, Go SDKs
│   ├── code-interpreter/ # Code interpreter SDK
│   └── mcp/              # MCP server (opensandbox-mcp)
├── specs/                # OpenAPI contracts and examples
├── kubernetes/           # Kubernetes operator, CRDs, task-executor, Helm, Kind e2e tests
├── cli/                  # osb CLI
├── skills/               # Claude Code skill: troubleshoot-sandbox
├── tests/                # Cross-language end-to-end SDK tests
├── docs/                 # Documentation
├── examples/             # SDK usage examples, agent integrations
├── sandboxes/            # Sandbox images/environments
├── oseps/                # OpenSandbox Enhancement Proposals
├── AGENTS.md             # Canonical routing for AI agent contributors
└── CLAUDE.md             # Claude Code context file

Required runtime

  • Python 3.10+ (server, Python SDK)
  • Docker Engine 20.10+ (Docker runtime)
  • Kubernetes 1.21.1+ (Kubernetes runtime)
  • Any SDK language runtime (Java/Kotlin, Node.js, .NET, Go as needed)

Key architectural components

Component Purpose
opensandbox-server (FastAPI) Control plane: lifecycle APIs (create, start, pause, resume, delete), pluggable runtime backends
execd In-sandbox execution daemon: runs commands, manages file I/O inside the container
egress eBPF-based or iptables per-sandbox network egress policy sidecar
ingress Ingress gateway with host-based and path-based routing modes
kubernetes/ Kubernetes operator + CRDs + Helm chart for k8s-native deployment

Config format

TOML (~/.sandbox.toml) for server configuration:

uvx opensandbox-server init-config ~/.sandbox.toml --example docker

Host-OS posture

Low host exposure for standard Docker/k8s runtimes. gVisor/Kata/Firecracker paths provide progressively stronger isolation at the cost of performance. Network egress is controlled by the egress component.

Claude Code integration

  • skills/troubleshoot-sandbox/ — one Claude Code skill file
  • CLAUDE.md — contributor operating instructions
  • AGENTS.md — canonical routing guide for AI agent contributors
  • opensandbox-mcp — MCP server for sandbox operations
03

Components

OpenSandbox — Components

Server (FastAPI control plane)

Feature Description
Lifecycle APIs create, start, pause, resume, delete
Pluggable runtimes Docker (production), Kubernetes (production)
Cleanup modes TTL with renewal, or manual cleanup with explicit delete
Auth API Key (OPEN-SANDBOX-API-KEY); can be disabled for dev
Networking Host (shared) or Bridge (isolated with HTTP routing)
Resource quotas CPU/memory limits (Kubernetes-style specs)
State backend SQLite at ~/.opensandbox/opensandbox.db for metadata/snapshots

Python SDK (pip install opensandbox)

sandbox = await Sandbox.create(image, entrypoint, env, timeout)
execution = await sandbox.commands.run("shell command")
await sandbox.files.write_files([WriteEntry(path, data, mode)])
content = await sandbox.files.read_file(path)

Code Interpreter SDK (pip install opensandbox-code-interpreter)

Wraps the sandbox SDK with language-specific code execution:

interpreter = await CodeInterpreter.create(sandbox)
result = await interpreter.codes.run("print(1+2)", language=SupportedLanguage.PYTHON)

CLI (osb)

osb config init
osb config set connection.domain localhost:8080
osb config set connection.api_key <key>
osb sandbox create --image python:3.12 --timeout 30m -o json
osb command run <sandbox-id> -o raw -- python -c "print(1+1)"

Subcommands: config, sandbox, command, plus additional subcommands for file operations, egress policy, and diagnostics.

MCP server (opensandbox-mcp)

pip install opensandbox-mcp
opensandbox-mcp --domain localhost:8080 --protocol http

Exposes via MCP: sandbox creation, command execution, text file operations. stdio transport.

Claude Code skill

Skill Location Purpose
troubleshoot-sandbox skills/troubleshoot-sandbox/ Diagnose and fix sandbox runtime issues

Multi-language SDKs

Language Package Install
Python opensandbox pip install opensandbox
Java/Kotlin com.alibaba.opensandbox:sandbox Gradle/Maven
TypeScript/JS @alibaba-group/opensandbox npm install
C#/.NET Alibaba.OpenSandbox dotnet add package
Go github.com/alibaba/OpenSandbox/sdks/sandbox/go go get

Kubernetes components

  • Operator: CRD-based sandbox lifecycle management
  • Task-executor: batch/parallel task execution
  • Helm chart: full deployment (operator + server + CRDs)
  • Kind e2e tests: in-repo Kubernetes end-to-end tests

Sandbox environments (pre-built images)

From sandboxes/ directory and examples:

  • opensandbox/code-interpreter:v1.0.2
  • Browser automation (Chrome, Playwright)
  • Desktop environments (VNC, VS Code)
  • Custom images via BYOI

AGENTS.md routing system

OpenSandbox uses a hierarchical AGENTS.md system as a canonical router for AI agent contributors:

  • Root AGENTS.md: overall routing
  • server/AGENTS.md, sdks/AGENTS.md, specs/AGENTS.md, kubernetes/AGENTS.md: component-specific rules
05

Prompts

OpenSandbox — Prompts

OpenSandbox ships minimal prompt content. Its primary LLM-facing artifacts are CLAUDE.md (contributor instructions), AGENTS.md (AI contributor routing), and one Claude Code skill.

Excerpt 1: CLAUDE.md — Working Principles (verbatim)

## Working Principles

- Think before coding: state assumptions, surface ambiguity, and ask or push back
  when the request has conflicting interpretations.
- Simplicity first: implement the smallest solution that satisfies the request;
  avoid speculative features, one-off abstractions, and unnecessary configurability.
- Surgical changes: touch only files and lines needed for the task, match local
  style, and do not refactor or delete unrelated pre-existing code.
- Goal-driven execution: translate non-trivial work into verifiable success
  criteria, add or update focused tests when behavior changes, and loop until
  checks pass or blockers are clear.

Prompting technique: Principle injection — a brief set of behavioral principles injected via CLAUDE.md into Claude Code's session context. These are not Iron Laws (no rationalization tables, no numbered rules with consequences) but high-level guidelines. The principles read as good engineering practice rather than agent-specific behavioral constraints.

Excerpt 2: AGENTS.md — Routing system (verbatim excerpt)

## Read First

- Root rules: `AGENTS.md`
- Server changes: `server/AGENTS.md`
- SDK changes: `sdks/AGENTS.md`
- Spec changes: `specs/AGENTS.md`
- Kubernetes changes: `kubernetes/AGENTS.md`
- Areas without a local `AGENTS.md`: read the nearest `README.md`,
  `DEVELOPMENT.md`, and relevant CI workflow.

Prompting technique: Hierarchical context routing — the AGENTS.md system directs AI contributors to the nearest local context document for any given subtask. This reduces the chance of an AI agent making changes with insufficient context. The "Never" section is the strongest directive:

Never:
- Edit generated output as the only fix.
- Mix unrelated component work into the same change.
- Refactor adjacent code just because it is nearby.

Note on skill files

The skills/troubleshoot-sandbox/ directory exists but the full SKILL.md content was not fetched in this analysis pass. From context, it likely provides instructions for diagnosing sandbox runtime failures using CLI commands and log inspection.

09

Uniqueness

OpenSandbox — Uniqueness & Positioning

Differs from seeds

OpenSandbox is categorically distinct from all 11 seeds — every seed operates at the LLM agent-loop layer while OpenSandbox is a sandbox control plane for isolated code execution. Within this batch, it most resembles AgentTier (K8s-native lifecycle) and CubeSandbox (microVM options) but differs in three key ways: (1) widest multi-language SDK surface (5 languages vs. 1-2 for others), (2) most explicit protocol-specification philosophy (the Sandbox Protocol defines a standard API contract others can implement, not just one implementation), and (3) CNCF listing as a cloud-native standard. The opensandbox-mcp server structurally parallels ccmemory's (seed) MCP pattern but exposes sandbox lifecycle operations rather than memory operations. No seed has anything comparable to OpenSandbox's oseps/ (formal RFC process) or CNCF governance alignment.

Positioning

OpenSandbox targets enterprise platform engineering teams and AI infrastructure teams — particularly those already operating Kubernetes and wanting a production-grade sandbox platform with formal governance, multi-language SDKs, and a clear protocol specification. The Alibaba origin and CNCF listing give it credibility in enterprise Java/Go/cloud-native contexts that other sandbox platforms (targeting Python-first AI developers) may lack.

Key architectural bets

  1. Protocol-first — separating the Sandbox Protocol (API contract) from the implementation enables community-driven runtime backends
  2. Widest SDK surface in the field — 5-language SDKs reduce adoption friction in polyglot organizations
  3. Pluggable isolation — operators choose container / gVisor / Kata / Firecracker based on their security requirements
  4. CNCF alignment — formal governance, OpenSSF badge, OSEPs signal enterprise production readiness

Observable failure modes

  • Docker-first, Firecracker second — the README presents Docker as "production-ready" but Firecracker as a documented option requiring separate setup; enterprises expecting sub-100ms cold starts must configure Firecracker explicitly
  • 5-language SDK maintenance burden — 5 SDK languages means 5 independent API client implementations to keep in sync with server changes; OpenAPI code generation helps but drift is a real risk
  • SQLite backend default — the default SQLite state store is not HA; enterprises need to configure an external database for production clusters
  • No built-in dashboard — operators who want a GUI must build one or use the CLI/SDK exclusively

Cross-references

  • Used with: Claude Code (via MCP server and CLAUDE.md), Cursor (via MCP server)
  • Listed in CNCF Landscape alongside agent-sandbox-k8s (kubernetes-sigs/agent-sandbox)
  • Examples reference kubernetes-sigs/agent-sandbox as a related Kubernetes project
  • Similar to E2B in API philosophy (sandbox create/run/delete); similar to AgentTier in K8s-native approach
04

Workflow

OpenSandbox — Workflow

Basic sandbox workflow (Python SDK)

sandbox = await Sandbox.create(
    "opensandbox/code-interpreter:v1.0.2",
    entrypoint=["/opt/opensandbox/code-interpreter.sh"],
    env={"PYTHON_VERSION": "3.11"},
    timeout=timedelta(minutes=10),
)
async with sandbox:
    execution = await sandbox.commands.run("echo 'Hello OpenSandbox!'")
    await sandbox.files.write_files([WriteEntry(path="/tmp/hello.txt", data="Hello World")])
    content = await sandbox.files.read_file("/tmp/hello.txt")
    interpreter = await CodeInterpreter.create(sandbox)
    result = await interpreter.codes.run("2 + 2", language=SupportedLanguage.PYTHON)
await sandbox.kill()

Phase-to-artifact map

Phase Artifact
Sandbox.create() Container/pod started
commands.run() Shell command output
files.write_files() Files written in sandbox
files.read_file() File content returned
CodeInterpreter.create() Code interpreter session
codes.run() Code execution result
sandbox.kill() Sandbox destroyed

CLI workflow

osb config init
osb sandbox create --image python:3.12 --timeout 30m -o json
osb command run <sandbox-id> -- python -c "print(1 + 1)"

MCP workflow (Claude Code integration)

{
  "mcpServers": {
    "opensandbox": {
      "command": "opensandbox-mcp",
      "args": ["--domain", "localhost:8080", "--protocol", "http"]
    }
  }
}

Claude Code calls MCP tools to create sandboxes, run commands, and read/write files — enabling sandboxed code execution from within a Claude Code session.

Sandbox lifecycle

create → running → pause → resume → delete

With TTL-based cleanup: sandboxes can be set to auto-expire with optional renewal.

Kubernetes workflow

opensandbox-server init-config ~/.sandbox.toml --example k8s
opensandbox-server

Server runs as control plane; Kubernetes operator manages pod lifecycle.

Approval gates

None — infrastructure platform, no human approval gates.

06

Memory Context

OpenSandbox — Memory & Context

Server-side state (SQLite)

The opensandbox-server uses SQLite at ~/.opensandbox/opensandbox.db for server-managed metadata:

  • Snapshot records
  • Sandbox metadata
  • TTL and renewal tracking

Sandbox-level persistence

Within a sandbox's lifetime:

  • Filesystem: files written via sandbox.files.write_files() persist for the sandbox's duration
  • Pause/resume: filesystem state is preserved across pause/resume cycles
  • Delete: all sandbox state is destroyed

There is no cross-sandbox or cross-session persistence in the standard configuration (sandboxes are ephemeral by design).

Snapshot capability (experimental)

From server documentation: snapshot records are stored in SQLite. The experimental features section mentions snapshot functionality, suggesting some form of point-in-time state capture, but this is not fully documented in the publicly visible README.

Agent context management

OpenSandbox has no built-in LLM context management. It is a sandbox runtime, not a memory system. The agent framework using the OpenSandbox SDK manages its own context window.

Memory type classification

  • Primary: sqlite (server metadata)
  • Secondary: file-based (in-sandbox filesystem during runtime)
  • No vector DB, graph DB, or cross-session memory
07

Orchestration

OpenSandbox — Orchestration

Multi-agent support

Yes — multiple sandboxes can run concurrently. The Kubernetes backend is designed for large-scale distributed scheduling. The task-executor in kubernetes/ suggests batch parallel task execution capability.

Orchestration pattern

Parallel fan-out at the infrastructure level (multiple sandboxes for parallel tasks). No built-in coordinator — the caller manages task distribution.

Isolation mechanism

Pluggable — operator chooses:

  • Container (Docker / Kubernetes pod) — default
  • gVisor — kernel-level isolation
  • Kata Containers — hardware VM (kata-runtime)
  • Firecracker microVM — dedicated kernel, sub-second startup

This is the widest isolation option menu in the batch (4 distinct mechanisms).

Execution mode

One-shot (sandbox created, task runs, sandbox destroyed or paused). TTL-based lifecycle.

Multi-model routing

None at the platform level. Model selection is the responsibility of the agent framework.

Network isolation

Per-sandbox egress policy via the egress component:

  • eBPF-based or iptables filtering
  • Fine-grained egress traffic filtering policies
  • Unified ingress gateway with multiple routing strategies

Crash recovery

Server restarts restore TTL timers (documented: "Timer restoration: Expiration timers restored after restart"). No documented pod-level crash recovery.

Streaming output

Not explicitly mentioned for standard command execution (synchronous request/response). SSE or streaming not mentioned in README for the core API.

Consensus mechanism

None.

Cross-tool portability

High — 5-language SDKs + MCP server + CLI + REST API. Any language can use the sandbox platform.

08

Ui Cli Surface

OpenSandbox — UI & CLI Surface

CLI (osb)

Yes — binary name: osb

  • Install: pip install opensandbox-cli or uv tool install opensandbox-cli
  • Not a thin wrapper — it's a full REST client for the OpenSandbox API
  • Key subcommands:
    • osb config init / osb config set — connection configuration
    • osb sandbox create — create a sandbox with image, timeout, resource specs
    • osb command run <sandbox-id> -- <command> — execute a command
    • File operations, egress policy management, diagnostics

Output format: -o json for machine-readable output, -o raw for plain text.

Local web dashboard

None — no web UI in the OpenSandbox repo. Management is via CLI or SDK.

MCP server (opensandbox-mcp)

Yes — exposes sandbox operations as MCP tools over stdio

opensandbox-mcp --domain localhost:8080 --protocol http

MCP tools: sandbox creation, command execution, text file operations. Used with Claude Code and Cursor.

IDE integration

Claude Code via MCP server registration. No native IDE extension.

Observability

  • Unified status with transition tracking (server-side)
  • SQLite audit trail for sandbox metadata
  • Kubernetes Events (when using k8s backend)
  • Structured error codes and messages

API documentation

  • OpenAPI specs in specs/ directory
  • osb -h for CLI reference
  • SDK reference documentation on open-sandbox.ai

Developer experience surface

  • CLAUDE.md and AGENTS.md for AI-assisted contribution
  • OSEPs (OpenSandbox Enhancement Proposals) in oseps/ directory — formal RFC process for platform changes
  • OpenSSF Best Practices badge — documents security and maintenance practices

Related frameworks

same archetype · same primary tool · same memory type

Daytona ★ 72k

Provide secure, elastic, sub-90ms sandbox compute infrastructure for running AI-generated code, accessible via multi-language…

CUA ★ 17k

Unified SDK for building, benchmarking, and deploying agents that interact with full OS GUIs via isolated VMs.

E2B ★ 12k

Run AI-generated code safely in cloud-hosted isolated sandboxes via a 3-line SDK integration.

Microsandbox ★ 6.3k

Spawn hardware-isolated microVMs as child processes directly from application code, with no server setup, in under 100ms.

CubeSandbox ★ 5.9k

Sub-60ms KVM microVM sandboxes for AI agents with E2B drop-in compatibility and <5MB memory overhead.

sandcastle (mattpocock) ★ 5.1k

Container-isolated TypeScript SDK for orchestrating AI coding agents with Docker/Podman/Vercel Firecracker sandboxes and…