Skip to content
/
Phase D Batch 18

Batch 18 — Sandboxed/isolated personal harnesses: container/WASM/scoped-FS isolation

Batch 18 — Sandboxed/isolated personal harnesses: container/WASM/scoped-FS isolation

Roster (10)

slug stars distribution cli_binary local_ui isolation_mechanism multi_agent tier
nanoclaw 29,431 npm-package ncl none container (Docker, per session) yes (multi-channel swarm) A
ironclaw 12,351 cargo crate ironclaw ratatui TUI WASM capability sandbox no A
paseo 6,754 npm monorepo paseo Expo mobile + Electron desktop git-worktree yes (multi-provider) A
osaurus 5,509 macOS app osaurus SwiftUI native macOS app Apple Container Linux VM no A
clawmanager 1,351 k8s-native cm React 19 dashboard Kubernetes Pod yes (K8s swarm) A
stakpak-agent 1,563 cargo crate stp ratatui TUI Docker + Warden network no A
agentbox-mattolson 174 go binary agentbox none mitmproxy + iptables no A
scion-gcp 1,548 go binary scion React web dashboard (Hub mode) container + git-worktree yes (harness-agnostic swarm) A
terminal-bench-env 82 script none none container (Docker, benchmark isolation) no B
code-yeongyu-my-cc-harness 13 config-files none none none yes (Sonnet+Haiku hierarchy) A

Intra-batch patterns

The batch theme — isolation mechanism — reveals a spectrum of isolation philosophies with a clear split between security isolation and evaluation isolation:

True security isolation (stakpak-agent, agentbox-mattolson, ironclaw, osaurus): The framework's primary isolation goal is preventing the agent from performing unsafe operations — network egress, filesystem access, credential exfiltration. agentbox-mattolson is the most extreme: two-layer enforcement (mitmproxy sidecar + iptables) specifically to intercept and redirect credentials. ironclaw's WASM capability sandbox prevents tool code from accessing anything not explicitly granted. osaurus uses Apple Container Linux VMs (macOS 26+) for OS-level isolation with a privacy filter.

Execution environment isolation (nanoclaw, scion-gcp, clawmanager): Containers provide agent isolation for state management and reproducibility, not primarily for security. Each agent/session gets a clean environment. The threat model is "contamination between agents," not "agent doing something dangerous."

Evaluation isolation (terminal-bench-env): Docker containers provide reproducibility for benchmark tasks — same starting state every time. Security is irrelevant; isolation serves measurement validity.

No isolation (paseo via git-worktree, code-yeongyu): File-system separation only (worktrees) or no isolation at all (personal harness trusting itself).

A secondary pattern: multi-provider neutrality is more common than expected. scion-gcp explicitly supports Claude/Gemini/Codex/OpenCode equally. paseo supports Claude/Codex/Copilot/OpenCode/Pi. nanoclaw supports multiple messaging channels. This batch has a higher provider-neutrality rate than most other batches — possibly because isolated/containerized setups abstract away the agent binary naturally.

Hook-as-QA appears only in code-yeongyu: running ruff, type checking, import analysis, and comment language enforcement on every PostToolUse event. No other isolated harness in this batch uses hooks this aggressively for quality enforcement.

Most interesting finds

  1. agentbox-mattolson — The mitmproxy + iptables two-layer network isolation architecture is the most sophisticated network security approach in the entire catalog. The design insight — use a MITM proxy to intercept and rewrite credentials from requests (so the agent never sees real credentials), then use iptables to block any traffic not routed through the proxy — creates a credential isolation system where the agent cannot exfiltrate API keys even if it tries. This pattern is applicable to any container-based agent deployment. The framework is early-stage (174 stars) but the architecture is production-relevant.

  2. code-yeongyu/my-claude-code-harness — The executor.md agent uses profane, aggressive language as intentional "attention weight engineering" to enforce single-task focus. The developer explicitly theorizes that strong emotional language creates stronger attention patterns for critical constraints. Combined with comprehensive PostToolUse Python static analysis hooks (ruff, typing, import style, comment language enforcement) running after every file write, this is the most unusual combination of prompt engineering + automated QA in the batch. The profane executor prompt is likely unique in the entire catalog of 330+ frameworks.

Items written as Tier B

  1. terminal-bench-env — Not an agent harness. It is evaluation infrastructure: 3,500+ verified Docker task environments for measuring terminal agent capability, accompanied by a minimal ReAct BashAgent. No workflow methodology, no skills, no hooks, no persistent memory. Treated as Tier B because it has research value and a published paper (arXiv 2602.07274) but does not fit the harness archetype. Full 11 files written.

Cross-references discovered

  • scion-gcp explicitly targets the same harnesses that appear in other batches: claude-code (batch 1/9 seeds), gemini-cli, codex/opencode — scion is the orchestrator layer above all of them
  • nanoclaw (nanocoai/nanoclaw) was assigned as qwibitai/nanoclaw in the batch manifest — actual canonical repo is at nanocoai organization; qwibitai may be the founder's personal account that was later transferred
  • ironclaw's NEAR AI authentication suggests it may be positioning for decentralized AI agent networks (NEAR Protocol ecosystem) — unusual for a terminal tool
  • osaurus depends on macOS 26 (Apple Container) — a 2026 OS not yet widely deployed; the framework is future-gated against current hardware
  • paseo daemon on port 6767 is a local relay service enabling mobile (Expo) ↔ desktop (Electron) ↔ CLI coordination — the multi-surface design pattern is unique in this batch

Isolation Mechanism Taxonomy (batch contribution)

This batch clarifies a taxonomy of isolation mechanisms by their primary goal:

Goal Mechanism Example
Credential isolation mitmproxy + iptables agentbox-mattolson
Tool capability control WASM capability sandbox ironclaw
OS-level privacy Apple Container Linux VM osaurus
Network egress control Docker + Warden stakpak-agent
State separation between agents Container per session nanoclaw, scion-gcp
Scalable multi-agent isolation Kubernetes pods clawmanager
Benchmark reproducibility Docker per task terminal-bench-env
File-system branch separation Git worktree paseo
None (personal trust) code-yeongyu