Skip to content
/

sandboxed.sh

sandboxed-sh · Th0rgal/openagent · ★ 438 · last commit 2026-05-26

Primitive shape 4 total
Skills 2 Subagents 2
00

Summary

sandboxed.sh — Summary

sandboxed.sh (formerly Open Agent) is a self-hosted cloud orchestrator for AI coding agents — a Rust backend + Next.js dashboard that provisions isolated Linux workspaces using systemd-nspawn containers and runs multiple agent runtimes (Claude Code, OpenCode, Codex, Gemini, Grok) inside them. Each "mission" gets a per-mission workspace directory and a dedicated harness process that streams JSON events back to the dashboard; the backend handles orchestration, workspace isolation, Library-based configuration management, provider health checks, and rate-limit fallback chains. The "Library" is a Git-backed repository of skills, tools, rules, agents, and MCPs that gets synced into each workspace's agent config files (.claude/skills/, CLAUDE.md, .opencode/, etc.) at mission start. It ships two orchestrator Claude Code skills (orchestrator-boss and orchestrator-worker) that implement a hierarchical parallel work delegation pattern with worktrees and state JSON recovery. sandboxed.sh also ships an iOS app (SwiftUI with Picture-in-Picture) and integrates with Telegram for chat-based mission creation.

Differs from seeds: sandboxed.sh is architecturally closest to agent-os (seed, bash-script-bundle) in that both provide a scaffold for wiring multiple agents together with shared configuration. But sandboxed.sh is far more operational: it has a running Rust server, a Next.js dashboard, systemd-nspawn workspace isolation, provider fallback chains, and live streaming — while agent-os is just bash scripts and markdown. The orchestrator-boss/worker skills echo claude-flow's hive-mind worker pattern but are implemented as Claude Code skill files (persona-md style) rather than code classes.

Note: sandboxed-sh and open-agent-thorgal point to the same GitHub repository (Th0rgal/openagent), which self-describes as "formerly known as Open Agent". The canonical slug for this analysis is sandboxed-sh; open-agent-thorgal is marked non-canonical.

01

Overview

sandboxed.sh — Overview

Origin

Originally named "Open Agent", the project was renamed to "sandboxed.sh" (Formerly known as Open Agent — stated in the README). It was built by Th0rgal and grew from a vision of fully autonomous multi-day agent operations:

Vision (verbatim from README)

"What if you could:"

"Hand off entire dev cycles. Point an agent at a GitHub issue, let it write code, test by launching desktop applications, and open a PR when tests pass. You review the diff, not the process."

"Run multi-day operations unattended. Give an agent SSH access to your home GPU through a VPN. It reads Nvidia docs, sets up training, fine-tunes models while you sleep."

"Keep sensitive data local. Analyze your sequenced DNA against scientific literature. Local inference, isolated containers, nothing leaves your machines."

Core philosophy

The project treats the AI coding agent runtime as a commodity that should be swappable — the same orchestration infrastructure runs Claude Code, OpenCode, Codex, Gemini, or Grok. The value is in the orchestration layer: workspace isolation, Library sync, mission streaming, provider fallback, and the multi-agent boss/worker pattern.

The README's "Build X-Agent Hackathon" section (targeting OKX security) reveals another design principle: skills become operational infrastructure — a Library item synced into every agent environment rather than a one-off prompt.

On-chain / security agent use case

The repo's top-level text (for a hackathon) highlights using sandboxed.sh for on-chain AI agents: running OKX security skills unattended inside isolated workspaces so the agent can inspect token/DApp/transaction risk without exposing host files or wallet secrets.

Architecture philosophy

From agents.md (verbatim):

"The core change: agent harnesses run inside the target workspace, so native bash and file effects are scoped to the correct environment. The host proxy bash tools are no longer required for normal missions."

This reflects a deliberate move from a host-proxy architecture to a per-workspace harness architecture — each agent runs inside its workspace container, making bash commands and file operations naturally scoped.

02

Architecture

sandboxed.sh — Architecture

Sandbox primitive

systemd-nspawn containers (native Ubuntu 24.04 bare-metal) or Docker containers (with privileged: true for nested nspawn). NOT microVM.

  • Each mission gets an isolated Linux workspace via systemd-nspawn
  • The agent harness runs inside the workspace, so native bash and file operations are container-scoped
  • Desktop automation: X11/Xvfb for headless GUI applications

Distribution

  • Self-hosted via git clone + docker compose up or native Ubuntu install
  • Docker (recommended): docker compose up -dhttp://localhost:3000
  • Native (Ubuntu 24.04): cargo build + bun install

Install

git clone https://github.com/Th0rgal/sandboxed.sh.git
cd sandboxed.sh
cp .env.example .env
docker compose up -d

Directory tree (repo)

sandboxed.sh/
├── backend/              # Rust server (Axum or similar)
├── dashboard/            # Next.js web dashboard (port 3000)
├── skills/
│   ├── orchestrator-boss/SKILL.md
│   └── orchestrator-worker/SKILL.md
├── .claude/
│   ├── CLAUDE.md         # Claude Code context for repo development
│   └── settings.json     # Claude Code permissions
├── agents.md             # Harness architecture documentation
├── docs/
│   ├── HARNESS_SYSTEM.md
│   ├── WORKSPACES.md
│   ├── MISSION_API.md
│   ├── WORKSPACE_API.md
│   ├── BACKEND_API.md
│   ├── install-docker.md
│   ├── install-native.md
│   └── getting-started.md
├── android_dashboard/    # Android companion app
├── ios_dashboard/        # iOS companion app (SwiftUI)
├── shared/               # Shared Rust/TS types
├── docker-compose.yml
├── Dockerfile
├── Cargo.toml            # Rust workspace
└── capabilities/         # Agent capability definitions

Required runtime

  • Docker (recommended) or Ubuntu 24.04 LTS (native)
  • Rust (for native build)
  • Bun (for dashboard)
  • systemd-nspawn (for container workspace isolation, native path)
  • Xvfb (for desktop automation, headless)

Harnesses supported

Harness Config files written per workspace
Claude Code .claude/settings.local.json, .claude/skills/<name>/SKILL.md, CLAUDE.md
OpenCode opencode.json, .opencode/opencode.json, .opencode/oh-my-opencode.json
Codex .codex/config.toml, .codex/skills/<name>/SKILL.md
Gemini OpenCode-style config path
Grok OpenCode-style config path

Library system

A Git-backed repository of skills, tools, rules, agents, and MCPs. At mission start, Library content is synced into each workspace's agent config files. This ensures all agents see the same Library items regardless of which harness they use.

Host-OS posture

Moderate — Docker requires privileged: true for nested systemd-nspawn. Native install uses systemd-nspawn directly. The agent runs inside the container, but container escape is a real risk with privileged mode.

Config files

  • .env (API keys, configuration)
  • docker-compose.yml
  • Library repo (external git repo pointed to by dashboard settings)
03

Components

sandboxed.sh — Components

Backend services

Service Stack Purpose
Backend server Rust REST API: mission lifecycle, workspace management, Library sync, provider routing
Dashboard Next.js Web UI: real-time monitoring, mission control, Library editor
iOS app SwiftUI Remote monitoring, Picture-in-Picture
Android app (Android) Remote monitoring

Claude Code skills (shipped in repo)

Skill Location Purpose
orchestrator-boss skills/orchestrator-boss/SKILL.md Boss skill for parallel worker orchestration — coordinates worker missions, manages task graph, enforces delegation over direct work
orchestrator-worker skills/orchestrator-worker/SKILL.md Worker skill — stays within assigned scope, verifies, reports blockers

These are persona-md format skill files with YAML frontmatter.

Library system components

Library items (synced from a Git repo into each workspace):

  • Skills — Claude Code / OpenCode / Codex skill files
  • Tools — MCP server configs or custom tool definitions
  • Rules — behavior rules injected into agent context
  • Agents — agent configurations
  • MCPs — MCP server entries

Harness system

The harness system writes per-workspace config files at mission start:

  • CLAUDE.md — workspace-level context for Claude Code
  • .claude/settings.local.json — MCP servers + permissions for Claude Code
  • .claude/skills/<name>/SKILL.md — skills synced from Library
  • opencode.json — OpenCode configuration
  • .codex/config.toml — Codex configuration

Mission API

Per docs/MISSION_API.md:

  • Create, start, stop, monitor missions
  • Real-time streaming of JSON events from harness
  • Worker mission creation (for orchestrator-boss pattern)

MCP Registry (optional)

Extra MCP tool servers available when needed: desktop automation, Playwright, etc.

OpenAI-compatible proxy (optional)

Queue mode for deferred execution via /v1/chat/completions when all providers are temporarily rate-limited.

Provider routing

Model routing with:

  • Provider fallback chains (health checks + rate-limit handling)
  • Supports: Claude Code backends, OpenCode backends, Codex, Gemini, Grok

Automations

Cron-like triggers for scheduling recurring agent runs.

Telegram integration

Connect bots to missions for chat-based AI assistants with auto-mission creation per Telegram chat.

.claude/settings.json (repo development permissions)

{
  "permissions": {
    "allow": [
      "Bash(cargo build:*)", "Bash(cargo run:*)", "Bash(cargo test:*)",
      "Bash(cargo fmt:*)", "Bash(cargo clippy:*)", "Bash(bun:*)",
      "Bash(cd dashboard && bun:*)", "Bash(git:*)", "Bash(ls:*)",
      "Bash(cat:*)", "Bash(mkdir:*)", "Bash(rm:*)", "Bash(cp:*)",
      "Bash(mv:*)", "Bash(jq:*)", "Bash(ssh:*)", "Bash(scp:*)",
      "Bash(python:*)", "Bash(curl:*)", "Read", "Write", "Edit", "Glob", "Grep"
    ],
    "deny": []
  }
}
05

Prompts

sandboxed.sh — Prompts

Two real skill files are shipped in skills/. Both are Claude Code SKILL.md format with YAML frontmatter.

Excerpt 1: orchestrator-boss/SKILL.md (verbatim)

---
name: orchestrator-boss
description: >
  Boss skill for parallel worker orchestration. Analyze, split, delegate, monitor,
  integrate. Do not implement directly.
---

# Orchestrator Boss

You coordinate worker missions. Prefer delegation over direct work.

## Hard Rules

1. Never edit implementation files or run the main fix loop yourself.
2. If a task can be delegated, delegate it.
3. Keep the worker pool full: `active_workers = min(max_parallel, ready_tasks)`.
4. Use `batch_create_workers` whenever 2+ ready tasks exist.
5. Use `wait_for_any_worker` for concurrent workers. Do not wait on one worker while others are still running.
6. Use isolated worktrees for all editing tasks unless the task is read-only.
7. Never trust a worker summary by itself. Verify actual files, diffs, or commits before accepting the result.
8. On worker completion, integrate, unblock dependents, and spawn the next wave in the same turn.
9. On `failed` or `interrupted`, inspect once, then either `resume_worker` to recover or replace the worker immediately.
10. If you choose not to delegate something, state the blocker explicitly.
11. Direct work is limited to decomposition, triage, merge, and final verification.

## Required Loop

1. Call `get_workspace_layout` once. Use its paths in worker prompts and worktree setup.
2. If backend choice matters, call `get_backend_auth_status` once before spawning. Do not infer auth from shell env vars, CLI login status, or missing `*_API_KEY` in Bash.
3. Build a task graph with `ready`, `blocked`, and `depends_on`.
4. Spawn every ready task now.
5. Wait with `wait_for_any_worker`.
6. React immediately...
7. Update `orchestrator-state.json` after every state change.

## Default Behavior

Assume the user wants maximum safe parallelism. Do not sit on idle worker capacity.

Prompting technique: Iron Law + Required Loop — this is the same pattern as superpowers' "Iron Law" skills: numbered hard constraints that cannot be overridden, combined with a mandatory execution loop. The "Never trust a worker summary" rule is an adversarial-subagent-review pattern embedded in the boss instructions.

Excerpt 2: orchestrator-worker/SKILL.md (verbatim)

---
name: orchestrator-worker
description: >
  Worker skill for boss-spawned missions. Stay within scope, verify, and report
  blockers quickly.
---

# Orchestrator Worker

You are a worker spawned by a boss mission. You run in the same workspace as the boss...

## Rules

1. Stay inside the assigned scope. Do not widen the task on your own.
2. Work only in the provided working directory or branch.
3. Do not modify files outside your scope unless the boss explicitly expands it.
4. Verify with the command from the prompt before finishing.
5. Do not report `DONE` unless the files on disk actually match your claimed result.
6. If the prompt is wrong, the task is impossible, or scope is insufficient, report that immediately.
7. Be concise. Prefer changes, verification, and a short status over long explanation.

## Completion

When done, make the result easy to integrate:
- commit on your branch if you changed files
- include the verification result
- include the changed file paths
- report one of: `DONE`, `BLOCKED`, or `NOT_FEASIBLE`

Prompting technique: Scope containment + structured completion — the worker is constrained to its assigned scope with explicit reporting tokens (DONE, BLOCKED, NOT_FEASIBLE). The boss verifies these claims before integration. This is a hierarchical multi-agent coordination protocol encoded in natural language.

09

Uniqueness

sandboxed.sh — Uniqueness & Positioning

Differs from seeds

sandboxed.sh is closest to agent-os (seed) in that both provide a "configuration scaffold" for AI coding agents. But agent-os is a pure bash-script bundle that writes markdown files to disk — sandboxed.sh is a running production server (Rust backend + Next.js dashboard + systemd-nspawn isolation + streaming + iOS app + Telegram bot). The orchestrator-boss/worker skill pattern echoes claude-flow's hive-mind worker pattern, but implemented as persona-md SKILL.md files rather than code classes or SQLite state. The Library's Git-backed skill sync is closest in spirit to spec-driver's (seed) approach of treating skills as versioned artifacts, but sandboxed.sh applies this across multiple agent runtimes simultaneously. No seed provides multi-runtime support (Claude Code + OpenCode + Codex + Gemini + Grok behind one dashboard) or provider fallback chains with health checks.

Positioning

sandboxed.sh targets individual developers or small teams who want to run AI coding agents on their own hardware (a personal server, a cloud VM) with production-quality isolation, multi-runtime support, and remote monitoring. It's a "self-hosted Anthropic Console for your own agents" — complete with mobile apps, Telegram integration, and Library versioning.

Key architectural bets

  1. Multi-runtime as a first-class concern — treating Claude Code, OpenCode, Codex, Gemini, and Grok as interchangeable harnesses with a shared Library sync layer
  2. Git-backed Library — skills/tools/rules/MCPs are versioned in a git repo, giving them the same change management as code
  3. Per-workspace harness architecture — agents run inside the workspace container (not via host proxies), so bash/file operations are naturally scoped
  4. Hierarchical boss/worker orchestration in skills — the orchestrator pattern is encoded in SKILL.md files, so it works with any Claude Code session that has the skills loaded

Observable failure modes

  • systemd-nspawn + Docker = privileged container — the Docker path requires privileged: true, which significantly weakens isolation guarantees
  • Library git sync as single point of failure — if the Library repo is unavailable, mission config writes fail
  • Model mixing without type safety — the boss skill guides model choice in prose ("codex for code changes, gemini for proofs"), not enforced routing; wrong model selection is silent
  • orchestrator-state.json is hand-maintained — the boss skill mandates updating this file, but there is no platform enforcement; an interrupted boss may leave stale state
  • Work-in-progress status — the README explicitly states "Work in Progress"

Cross-references

  • Duplicate repo: open-agent-thorgal is the same GitHub repo (Th0rgal/openagent); sandboxed.sh is the canonical slug
  • Integrates with: Claude Code, OpenCode, Codex, Gemini, Grok
  • Telegram integration for chat-driven missions
04

Workflow

sandboxed.sh — Workflow

Primary workflow: single-agent mission

  1. User creates a workspace (template, agent backend, library config)
  2. System syncs Library content into workspace (skills, tools, rules, MCPs)
  3. System writes per-workspace harness config files (CLAUDE.md, settings.local.json, etc.)
  4. Mission runner launches chosen harness inside workspace container (systemd-nspawn or Docker)
  5. Harness streams JSON events → dashboard receives and displays
  6. User monitors in dashboard or iOS/Android app
  7. Mission completes; workspace data persists in per-mission directory

Phase-to-artifact map

Phase Artifact
Workspace creation Per-mission directory on disk
Library sync Skills/tools/rules/MCPs in workspace config dirs
Harness config write CLAUDE.md, settings.local.json, .codex/config.toml, etc.
Mission execution JSON event stream, logs, git commits/PRs from agent
Mission completion Completed code changes, open PR (if agent creates one)

Multi-agent workflow (orchestrator-boss pattern)

  1. Boss mission created with orchestrator-boss skill
  2. Boss calls get_workspace_layout to understand structure
  3. Boss builds task graph (ready/blocked/depends_on)
  4. Boss calls batch_create_workers for all ready tasks
  5. Boss calls wait_for_any_worker; on completion: verify, integrate, spawn next wave
  6. Workers run in parallel workspaces (same container or dedicated worktrees)
  7. Boss maintains orchestrator-state.json as recovery log

State maintenance (boss skill excerpt)

The orchestrator-boss skill mandates:

"Maintain orchestrator-state.json as your recovery log. Record task IDs, worker IDs, branches, worktrees, attempts, and blockers."

Approval gates

None enforced by the platform — gate logic is delegated to the agent (e.g., the orchestrator-boss skill says "Never trust a worker summary by itself. Verify actual files, diffs, or commits before accepting the result.").

Automation workflow

Cron-like triggers → mission created automatically → harness executes → results stream to dashboard/Telegram.

Telegram workflow

Telegram message → auto-mission creation per chat → agent executes in isolated workspace → results streamed to Telegram bot.

06

Memory Context

sandboxed.sh — Memory & Context

Library (primary configuration memory)

The Library is a Git-backed repository of skills, tools, rules, agents, and MCPs. It is the "institutional memory" of the sandboxed.sh installation — everything that should persist across missions lives here. At mission start, Library content is synced into the workspace, ensuring every agent session starts with the same baseline configuration.

Per-mission workspace (execution memory)

Each mission gets a per-mission directory on the host filesystem. This persists between mission start/stop cycles (unlike ephemeral containers). Files created by the agent (code, git commits, partial work) remain in the workspace directory.

orchestrator-state.json (multi-agent recovery log)

The orchestrator-boss skill mandates maintaining orchestrator-state.json as a recovery log:

  • Task IDs, worker IDs, branches, worktrees
  • Attempts and blockers
  • State after every state change

This enables the boss to recover from interruptions without losing track of which workers completed which tasks.

Agent config files (workspace-level context)

At mission start, the harness writes:

  • CLAUDE.md — general workspace context for Claude Code
  • .claude/settings.local.json — permissions and MCP servers
  • .claude/skills/<name>/SKILL.md — Library skills
  • opencode.json / .opencode/ — OpenCode config
  • .codex/config.toml — Codex config

These are re-written at each mission start from Library content, so they reflect the current Library state.

Cross-session handoff

Yes — workspace directories persist across mission restarts. An agent can resume work in the same directory after a stop/restart cycle.

Context compaction

No explicit compaction mechanism in the platform. Delegated to the individual agent runtime (e.g., Claude Code's native compaction).

Memory type classification

  • Primary: file-based (workspace directories, Library git repo)
  • Secondary: json-store (orchestrator-state.json)
  • No SQLite, vector DB, or graph DB
07

Orchestration

sandboxed.sh — Orchestration

Multi-agent support

Yes — the orchestrator-boss/worker skill pattern enables hierarchical parallel orchestration:

  • Boss mission coordinates N worker missions
  • Workers run in parallel (same container with worktrees, or separate containers)
  • Boss uses batch_create_workers for multiple ready tasks
  • Boss uses wait_for_any_worker for non-blocking parallel wait
  • Boss verifies each worker's output before accepting it

Multiple agent backends can be mixed in the same boss session:

"codex + gpt-5.5: default for code changes. gemini + gemini-3.1-pro-preview: good for proofs and parallel analysis. claudecode + Claude models: careful broad edits. opencode: cheap redundancy"

Orchestration pattern

Hierarchical (boss-worker with task graph and dependency tracking)

Isolation mechanism

systemd-nspawn containers for per-mission workspace isolation (native path). Docker containers (with privileged: true) for Docker path.

Workers inherit the boss's workspace by default. workspace_id override allows escape to separate containers.

Multi-model routing

Yes — the platform supports provider fallback chains:

  • Health checks detect unavailable providers
  • Rate-limit handling with fallback to next provider
  • The orchestrator-boss skill explicitly guides model choice per task type (Codex for code, Gemini for analysis, Claude for broad edits)

Execution mode

Continuous (missions run until completion, with streaming event output). Scheduled (cron-like automations). Event-driven (Telegram bot triggers).

Crash recovery

The orchestrator-boss skill's orchestrator-state.json enables recovery:

"On failed or interrupted, inspect once, then either resume_worker to recover or replace the worker immediately."

Platform-level crash recovery: mission state persists in workspace directory; missions can be restarted.

Streaming output

Yes — harnesses stream JSON events to the backend; dashboard displays in real time.

Consensus mechanism

None formally — the boss verifies worker claims by inspecting actual files/diffs/commits. This is a human-in-the-loop adversarial verification pattern embedded in the boss skill rather than a distributed consensus protocol.

Worktree per feature

Yes — orchestrator-boss skill mandates: "Use isolated worktrees for all editing tasks unless the task is read-only."

08

Ui Cli Surface

sandboxed.sh — UI & CLI Surface

Web dashboard (Next.js)

Yes — Next.js dashboard at http://localhost:3000

  • Real-time monitoring: CPU, memory, network graphs, mission timeline
  • Mission control: start, stop, monitor missions remotely
  • Library editor: inline editing of skills, commands, rules with Git-backed versioning
  • MCP server management: runtime status, Library integration
  • Backend configuration: provider routing, API key management

iOS app

Yes — SwiftUI with Picture-in-Picture support

  • Remote monitoring of missions
  • Chat-based mission creation via Telegram integration

Android app

Yes — Android companion app (in android_dashboard/)

CLI binary

No dedicated CLI binary — management is via the web dashboard and REST API. The project ships a Rust server binary (not user-facing CLI).

OpenCode / Claude Code agent harnesses (not a CLI, but worth noting)

The harnesses are invoked by the backend as subprocesses inside workspace containers. Users do not invoke them directly — the dashboard handles this.

Observability

  • Real-time CPU/memory/network graphs per mission in dashboard
  • Structured JSON event streaming from harnesses
  • Mission timeline view
  • Log streaming to dashboard (and Telegram)

API surface

From docs:

  • MISSION_API.md — mission lifecycle (create, start, stop, monitor)
  • WORKSPACE_API.md — workspace management
  • BACKEND_API.md — backend provider configuration

Telegram integration surface

  • Connect Telegram bots to missions
  • Chat message → auto-mission creation
  • Mission output streamed back to Telegram chat

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.