Skip to content
/
Phase B Batch 33

Batch 33 — Sandbox Runtimes Overflow (microVM / WASM / K8s / CUA)

Batch 33 — Sandbox Runtimes Overflow (microVM / WASM / K8s / CUA)

Roster (8 frameworks)

slug stars distribution cli_binary local_ui orchestration multi_model tier
agenttier 19 standalone-repo (Helm) agenttier (Go+Python) web-dashboard (Next.js) none (infra only) no A
cua-sandbox 17,104 standalone-repo (pip + curl) lume/cua-driver/cuabot/cb native-desktop-H265 parallel-fan-out no A
sandboxed-sh 438 standalone-repo (Docker/native) none web-dashboard (Next.js, :3000) hierarchical yes A
opensandbox 10,828 standalone-repo (pip/npm/go/etc) osb none parallel-fan-out no A
cubesandbox 5,940 standalone-repo (one-click sh) cubemastercli web (??) parallel-fan-out no A
swe-rex 508 pip package none none parallel-fan-out no A
capsule 285 pip + npm capsule none none no A
open-agent-thorgal 438 (duplicate of sandboxed-sh) C (duplicate)

Intra-batch patterns

All 8 entries sit below the agent loop rather than at it — they are execution environments, execution protocols, or infrastructure platforms that run underneath whatever agent framework operates above them. None ships slash-commands, Claude Code hooks, or skill files as their primary artifact (sandboxed.sh is the exception, shipping 2 orchestrator skills, but its core value is the Rust server + workspace isolation). The batch divides cleanly into four sub-categories by isolation primitive: K8s pod (AgentTier), macOS/QEMU VM (CUA), systemd-nspawn/Docker (sandboxed.sh), container+pluggable secure runtime (OpenSandbox, CubeSandbox), persistent bash session abstraction (SWE-ReX), and WebAssembly function sandbox (Capsule). Two entries have enterprise Asian-technology-company origins (OpenSandbox/Alibaba, CubeSandbox/Tencent) and target Chinese-cloud-native deployment contexts alongside global markets.

Most interesting finds

  1. CubeSandbox — achieves sub-60ms KVM microVM cold starts via snapshot+CoW with <5MB memory overhead, enabling thousands of sandboxes per node. E2B drop-in compatibility is a clever go-to-market move targeting the entire E2B user base. This is the strongest isolation-at-speed tradeoff in the corpus.

  2. Capsule — the only WASM-native isolation in the corpus. WebAssembly fuel metering as CPU control (instruction counting, not cgroups) is a novel architecture. Cross-platform (no Linux/KVM) gives it a deployment envelope no other batch entry has. The function-level (not session-level) granularity is a unique design choice.

Items written as Tier C

  • open-agent-thorgal: Duplicate of sandboxed-sh (same GitHub repo, Th0rgal/openagent self-describes as "Formerly known as Open Agent"). Only summary + METRICS written; canonical: false. Full 11-file analysis lives in sandboxed-sh.

Cross-references discovered

  • sandboxed-sh and open-agent-thorgal are the same repo (Th0rgal/openagent) — marked non-canonical.
  • sandboxed.sh orchestrator-boss skill's multi-model guidance explicitly names codex/gpt-5.5, gemini-3.1-pro-preview, claudecode, and opencode — real multi-model routing encoded in a persona-md skill file, which is unusual.
  • CUA ships a skills/ directory that integrates with Claude Code — making it a hybrid: primarily infrastructure, secondarily a skills provider.
  • OpenSandbox references kubernetes-sigs/agent-sandbox as a related project (batch 33's own agent-sandbox-k8s already analyzed in another batch).
  • SWE-ReX was extracted from SWE-agent and is used by Mini-SWE-agent — the SWE-agent ecosystem is a family of related projects.
  • AgentTier references E2B, Daytona as implicit competitors; CubeSandbox explicitly targets E2B migration; OpenSandbox references E2B API shape — E2B is the de facto reference API for this sandbox layer.