Skip to content
/
Phase B Batch 23

Batch 23 Notes — Codex orchestration (oh-my-codex family + risk-routing governance)

Batch 23 Notes — Codex orchestration (oh-my-codex family + risk-routing governance)

Roster

Slug Stars Archetype Primary Innovation
oh-my-codex-yeachan 29,662 MCP-anchored + skills Canonical Codex skills platform; ultragoal ledger; tmux team mode; quantitative ambiguity gating (0.15/0.20/0.30); 46+ skills; 20+ personas
oh-my-codex-scalarian 65 (archived) npm monorepo v2 monorepo fork; Codex-only; archived April 2026; local path leakage in README
oh-my-codex-sigridjineth 14 Claude Code fork Ambassador fork; tier routing (Haiku/Sonnet/Opus); keyword-detector hooks; Discord/Telegram notifications; LSP/AST tooling
oh-my-openagent 59,562 Platform-class agent OS OpenCode plugin (TypeScript+Bun); Sisyphus/Hephaestus named agents; Hashline LINE#ID; Boulder state machine; 52 hooks; SUL-1.0 license
am-will-swarms 203 skills-only behavioral Dependency-aware DAG (T1/T2/T3 depends_on arrays); wave-based parallel execution; zero CLI/hooks/MCP; npx skills add install
vnx-orchestration 34 governance-first orchestrator NDJSON ledger (1,400+ entries); deterministic triple gate (Codex+Gemini+CI); tmux 2x2 grid operator mode; web dashboard; Python
do-it 20 skills+hooks behavioral Risk-tier routing (Light/Standard/Heavy); 23 TOML agent definitions; 7 hook scripts; 5 DIM boolean session state; git worktree for Heavy tier
hotl-plugin 22 skills-only behavioral 8-phase workflow; triple HOTL contracts (intent/verification/governance); .hotl/state/ resumability; 5-tool support; smart routing
sandcastle-mattpocock 5,103 TypeScript SDK Container isolation (Docker/Podman/Vercel Firecracker); npm library API; Effect library; merge-to-head branch strategy; mattpocock brand
metaswarm 284 hierarchical orchestrator 9-phase + 4-phase inner loop; 19 agents; BEADS knowledge base; self-improving via /self-reflect; cross-model adversarial review; 3-tool support

oh-my-codex Side-by-Side Comparison

The "oh-my-codex" naming meme was adopted by three different authors with very different implementations:

Dimension yeachan (canonical) scalarian (fork) sigridjineth (ambassador)
GitHub stars 29,662 65 (archived) 14
Contributors 30 3 1
Status Active (May 2026) Archived (April 2026) Active (May 2026)
Canonical flag YES (explicitly warns against others) NO NO (self-describes as Ambassador)
Target tool Codex CLI (primary) Codex CLI only (no Claude bridge) Claude Code (primary), Codex (secondary)
Skill count 46+ 14 41
Persona count 20+ ~8 ~20
Ultragoal system YES — append-only ledger, lifecycle tracking NO YES (inherited from canonical)
Deep-interview YES — quantitative ambiguity scores (0.15/0.20/0.30 thresholds) NO — simplified version without scoring YES (inherited, no quantitative scoring)
Tier routing NO — single model per run NO YES — Haiku/Sonnet/Opus by task complexity
Hooks SessionStart, PreToolUse, PostToolUse, UserPromptSubmit, Stop Minimal UserPromptSubmit (keyword-detector + skill-injector), SessionStart (project-memory)
tmux team mode YES — multiple concurrent sessions NO NO
Notifications NO NO YES — Discord and Telegram
LSP/AST tooling NO NO YES — language server protocol integration
State persistence .omx/ directory Minimal .omx/ + project-memory
CLI binary omx (TypeScript+Rust) omx (packages/cli/dist/bin.js) omx (inherited)
Monorepo structure Single package v2 monorepo (packages/cli/core/mcp-server) Single package
Notable defect None documented Local paths leaked in README (/Users/staticpayload/...) None documented
"oh-my-codex" meaning Production Codex skills platform; Codex's missing workflow layer npm monorepo experiment; Codex-only rebuild; abandoned Codex/Claude skills platform; tier-aware routing with notifications

Key Differentiator Summary

  • yeachan = the platform: The canonical, highest-star, most-feature project. Defines what oh-my-codex means. tmux team mode for parallelism; quantitative deep-interview gating; 20+ personas; TypeScript+Rust binary.
  • scalarian = the failed experiment: A v2 monorepo rebuild that went Codex-only, introduced a local developer path leak in its README, and was archived after ~6 months. The monorepo split (packages/cli/core/mcp-server) is the only architectural innovation.
  • sigridjineth = the evolved fork: An Ambassador (explicitly acknowledged by canonical project) that adds tier-routing (Haiku/Sonnet/Opus), notification webhooks (Discord/Telegram), and LSP/AST tooling — features canonical lacks. Uses Claude Code as primary vs Codex. This is the most architecturally interesting of the three derivatives.

Intra-Batch Patterns

Risk-Tier Routing Convergence

Three frameworks independently developed risk-aware routing:

  1. do-it: Light/Standard/Heavy by DIM session booleans (7 hooks enforce routing)
  2. hotl-plugin: smart routing (question/fix/debug/build) before workflow selection
  3. oh-my-codex-sigridjineth: Haiku/Sonnet/Opus by task complexity (tier routing)

None of the 11 seed frameworks has this. Suggests the field has independently converged on cost/complexity-aware agent routing as a natural evolution.

Knowledge Persistence Spectrum

Framework Persistence Mechanism Granularity
metaswarm JSONL knowledge base (BEADS) Per-PR, self-improving
hotl-plugin .hotl/state/ + reports Per-run resumable
do-it .do-it/session/ (DIM booleans) Per-session
vnx-orchestration NDJSON ledger (1,400+ entries) Per-decision
oh-my-codex-yeachan .omx/ + ultragoal ledger Per-task

Governance Philosophy Split

Approach Frameworks
Governance-first (audit trail, receipts) vnx-orchestration (NDJSON ledger), metaswarm (BEADS + knowledge base)
Quality-gate-first (approval gates, reviews) hotl-plugin (3 HOTL contracts), metaswarm (Design Review Gate)
Risk-routing-first (avoid unnecessary overhead) do-it (DIM tiers), hotl-plugin (smart routing), oh-my-codex-sigridjineth (tier routing)

Container Isolation Outlier

sandcastle is the only framework in batch 23 (and potentially all 33 batches) using actual container isolation (Docker/Podman/Vercel Firecracker microVM). All other frameworks use git worktrees, file system isolation, or no isolation. This represents a categorically different threat model — sandcastle assumes agent code is potentially hostile to the host; other frameworks assume trusted agents with access to the full repo.

External Tool Delegation

Two frameworks explicitly delegate to competing AI models:

  1. metaswarm: Codex CLI and Gemini CLI as adversarial review delegates (cross-model review)
  2. vnx-orchestration: Codex + Gemini + CI as triple gate

This pattern (use cheaper/different model for specific phases) is absent from all 11 seeds and most other batches.

Most Interesting Find

metaswarm's selective knowledge priming: bd prime --files --keywords --work-type is the most elegant solution to the "knowledge base bloat" problem seen in any framework. The JSONL format + metadata-filter retrieval means institutional memory can scale to thousands of entries across hundreds of PRs without ever overflowing the context window. No other framework in the batch of 33 has a comparable solution to this scaling problem.

Runner-up: oh-my-codex-yeachan's quantitative ambiguity scoring (0.15/0.20/0.30 thresholds in deep-interview) — treating ambiguity as a measurable quantity rather than a binary go/no-go is genuinely novel in this space.

Tier C Items

None. All 10 frameworks had sufficient data for full analysis.

Cross-References

  • sandcastle is directly comparable to claude-flow (batch seeds) — both are multi-agent runtimes, sandcastle with container isolation vs claude-flow with MCP tools
  • metaswarm credits Superpowers (seed framework) as foundational skills source
  • hotl-plugin is comparable to kiro (seed) for phased workflow with approval gates
  • do-it is comparable to BMAD-METHOD (seed) for role-based agent definitions with quality gates
  • oh-my-openagent (Hashline LINE#ID, Boulder state machine) has no seed parallel — platform-class agent OS unique in the catalog
  • am-will-swarms is comparable to spec-kit (seed) in minimalism but differs in the DAG dependency model