Skip to content
/
Phase D Batch 16

Batch 16 — Multi-agent workflow loops: Ralph variants, YAML-driven workflows, director+coder splits

Batch 16 — Multi-agent workflow loops: Ralph variants, YAML-driven workflows, director+coder splits

Roster (10)

slug stars distribution cli_binary local_ui orchestration multi_model tier
prodigy-iepathos 9 cli-tool (Rust) prodigy partial (axum HTTP, no documented dashboard) parallel-fan-out (MapReduce) no A
switchboard 209 vscode-extension none VS Code kanban webview + MCP HTTP hierarchical yes (complexity routing) A
slate-v1 unknown unknown unknown unknown unknown unknown C
blackbox-code unknown unknown (404) unknown unknown unknown unknown C
cestdone 2 npm-package cestdone none hierarchical (Director+Worker) yes (--director-model / --worker-model) A
ralph-snarktank 19,612 bash-script-bundle ralph.sh (bash) none sequential (continuous-ralph) no A
ralph-claude-code 9,205 bash-script-bundle ralph (bash, global) terminal-tui (tmux) sequential (continuous-ralph) no A
evo 770 cli-tool (Python PyPI) evo web-dashboard (port 8080) task-decomposition-tree no A
pi-mono 55,497 npm-package pi terminal-tui (@pi-tui) none (extensible) yes (8 providers) A
pi-ralph-orch 13 standalone-repo (pi ext) none (slash commands in pi) terminal-tui (pi modals) sequential (continuous-ralph) no A

Intra-batch patterns

All 8 Tier-A frameworks in this batch implement some form of autonomous loop — the shared theme is "keep going until done" rather than "do one thing and stop." However, the loops diverge sharply: Prodigy, evo, and cestDone use process-level isolation (separate OS processes or git worktrees per agent/experiment), while the Ralph variants (snarktank, frankbria, pi-ralph) use in-process loops within a single CLI session. File-based communication through markdown or JSON files is universal: every framework stores its cross-iteration state in human-readable files rather than databases (except pi-mono, which has no prescribed state). The director+worker split appears in three distinct forms: cestDone's thin-Director/fresh-Worker subprocess pattern, Switchboard's Planner→Lead/Coder/Intern hierarchy, and evo's orchestrator→subagent tree. Quality gate enforcement is the differentiating factor for loop reliability: evo adds formal gates (pass/fail checks), Ralph-frankbria adds a circuit breaker and dual-condition exit, and snarktank/ralph relies on CI feedback loops — while pi-ralph and cestDone use more advisory approaches.

Most interesting finds

  1. evo — The [EVO DIRECTIVE] banner injection via hook channels (PreToolUse, UserPromptSubmit, SessionStart) for orchestrator→subagent mid-run communication is a novel architectural pattern not seen in any other framework in the catalog. Hooks are typically used for monitoring or validation; using them as a downward communication channel to in-flight subagents is genuinely new. Combined with the tree-search frontier strategies (pareto_per_task inspired by GEPA) and remote cloud backends, evo is the most sophisticated autonomous optimization framework analyzed to date.

  2. Switchboard — Cross-provider heterogeneous routing (deliberately routing cheap tasks to Gemini Flash and expensive tasks to Claude Opus) via a visual kanban is a unique architectural stance — treating token cost optimization as a first-class feature of the orchestration layer. No other framework in the catalog routes across competing AI providers at runtime.

Items written as Tier C

  1. slate-v1 — URL points to a single-page React app (randomlabs.ai) with no public GitHub repository and no navigable documentation. No material available.

  2. blackbox-code — GitHub URL (https://github.com/blackboxaicode/cli) returns HTTP 404. Repository not found.

Cross-references discovered

  • pi-ralph-orch explicitly builds on pi-mono (@earendil-works/pi-coding-agent Extension API) — direct platform dependency
  • ralph-claude-code and ralph-snarktank both implement the Geoffrey Huntley Ralph pattern (ghuntley.com/ralph/) — same technique, radically different engineering depth (200 lines bash vs. multi-module system with 566 tests)
  • pi-ralph-orch credits ralph-orchestrator (mikeyobrien) as its inspiration, placing it in the snarktank/ralph lineage
  • evo lists pi as a supported host and uses pi's subagent spawning mechanism (pi-subagents package) — consumer relationship with pi-mono
  • snarktank/ralph is explicitly cited as the inspiration for ralph-claude-code (frankbria credits Geoffrey Huntley's technique and references snarktank)

Two Ralph Namesakes — Comparison

Both snarktank/ralph and ralph-claude-code implement the Geoffrey Huntley "Ralph Wiggum" autonomous loop pattern, but they represent opposite ends of the engineering spectrum:

snarktank/ralph (19,612 stars, dormant):

  • ~200-line bash script + 2 SKILL.md files
  • Always-fresh context per iteration (new AI process)
  • No exit gate sophistication — <promise>COMPLETE</promise> string detection
  • No rate limiting, no circuit breaker, no monitoring
  • Supports Amp (primary) and Claude Code
  • Designed for simplicity; copy-paste deployment
  • The reference implementation

ralph-claude-code (9,205 stars, active):

  • Multi-module bash system (9 main scripts + 9 lib modules) with 566 tests + CI
  • Session-based context with configurable 24-hour expiry (optional continuity)
  • Dual-condition exit gate (EXIT_SIGNAL: true + completion indicators)
  • Rate limiting (100 calls/hour), circuit breaker (CLOSED/HALF_OPEN/OPEN), 3-layer API limit detection
  • Claude Code only (not Amp)
  • tmux monitoring dashboard
  • Interactive setup wizard (ralph-enable)
  • Designed for production reliability

The frankbria version is essentially "snarktank/ralph made production-grade" — the same conceptual pattern, engineered to handle all the edge cases that appear when running autonomous loops unattended in real projects. The 566-test suite and changelog (tracking bugs like checkbox-regex false positives and session hijacking) reveal the full complexity hidden behind the simple snarktank loop.