Skip to content
/

metaswarm

metaswarm · dsifry/metaswarm · ★ 284 · last commit 2026-05-16

Production-tested multi-agent orchestration with self-improving knowledge base, recursive Swarm Coordinator architecture, and mandatory cross-model adversarial review — supporting Claude Code, Gemini CLI, and Codex CLI simultaneously.

Best whenProduction-grade AI coding requires institutional memory, not just a smart agent. Knowledge from every PR becomes the foundation for the next task; trust not…
Skip ifTrusting subagent self-reports (orchestrators must verify independently), Same model reviewing its own work (cross-model review enforced)
vs seeds
bmad-method(role-based multi-agent, quality gates, phase-gated workflow). Exceeds BMAD in recursive orchestration depth, self-impro…
Primitive shape 61 total
Commands 27 Skills 13 Subagents 19 Hooks 2
00

Summary

metaswarm — Summary

metaswarm by David Sifry (dsifry) is a multi-agent orchestration framework (metaswarm npm package, v0.11.0) targeting Claude Code, Gemini CLI, and Codex CLI simultaneously. With 284 stars and 1 contributor (production-extracted system), it is the most elaborate multi-tool, self-improving orchestration framework in this batch: 19 specialized agent personas, 13 orchestration skills, 9-phase workflow (Research → Plan → Design Review Gate → Work Unit Decomposition → Orchestrated Execution → Final Review → PR Creation → PR Shepherd → Closure & Learning), and a 4-phase inner execution loop (IMPLEMENT → VALIDATE → ADVERSARIAL REVIEW → COMMIT). Standout features: a parallel 5-agent Design Review Gate, recursive Swarm Coordinator → Issue Orchestrator → sub-orchestrator spawning, BEADS (bd CLI) for git-native task tracking with selective knowledge priming, a JSONL knowledge base that self-improves via /self-reflect after every PR merge, 100% coverage thresholds enforced as a blocking gate, and external tool delegation to Codex CLI and Gemini CLI for cross-model adversarial review. Closest comparison to BMAD-METHOD (role-based multi-agent with quality gates) but extends with recursive orchestration depth, self-learning knowledge base, and three-model support (Claude/Gemini/Codex) in a single install. Inspired by Superpowers (foundational skills) and BEADS (git-native task coordination).

01

Overview

metaswarm — Overview

Origin

Created by David Sifry (dsifry). 284 stars, 1 contributor. Active to May 2026. Package: metaswarm v0.11.0 (npm). Described as "an extraction of a production-tested agentic orchestration system, proven in the field writing production-level code with 100% test coverage, mandatory TDD, multi-reviewed spec-driven development, and SDLC best practices across hundreds of PRs."

Philosophy

From README:

"A self-improving multi-agent orchestration framework for Claude Code, Gemini CLI, and Codex CLI. Coordinate 18 specialized AI agents and 13 orchestration skills through a complete software development lifecycle, from issue to merged PR, with recursive orchestration, parallel review gates, and a git-native knowledge base."

Design principles from README:

  1. Knowledge-Driven Development — agents prime from knowledge base before every task
  2. Trust Nothing, Verify Everything — orchestrators validate independently; never trust subagent self-reports
  3. Parallel Review Gates — independent specialist reviewers run concurrently, not sequentially
  4. Recursive Orchestration — orchestrators spawn sub-orchestrators for any complexity level
  5. Agent Ownership — each agent owns its lifecycle; orchestrator delegates, not micromanages
  6. BEADS as Source of Truth — all task state in BEADS; agents coordinate via database, not messages
  7. Test-First Always — TDD mandatory; coverage thresholds (100% lines/branches/functions/statements) block PR creation
  8. Human-in-the-Loop — checkpoints at planned review points, automatic escalation after 3 failed iterations

Key Design Decisions

  1. Multi-tool simultaneous support: Claude Code plugin + Gemini CLI extension + Codex CLI plugin, installed via npx metaswarm init
  2. BEADS integration: git-native issue/task tracking by Steve Yegge — bd CLI as coordination backbone
  3. Self-improving system: /self-reflect after PR merge writes patterns, gotchas, decisions back to JSONL knowledge base
  4. Recursive orchestration: Swarm Coordinators → Issue Orchestrators → sub-orchestrators (swarm of swarms)
  5. External tool delegation: Codex CLI and Gemini CLI as implementation or adversarial review delegates
  6. Superpowers-inspired: explicitly acknowledges Superpowers (Jesse Vincent) as foundation for skills methodology

Supported Platforms

Platform Install Commands
Claude Code Plugin marketplace /start-task, /setup, etc.
Gemini CLI gemini extensions install /metaswarm:start-task, etc.
Codex CLI Plugin marketplace $start, $setup, etc.

Use Cases

  • Issue-to-PR autonomous execution with full 9-phase lifecycle
  • Complex feature development with spec-driven adversarial review
  • Parallel multi-specialist design review before any code is written
  • Cross-model adversarial review (Claude implements → Gemini reviews, or vice versa)
  • Knowledge accumulation across hundreds of PRs for organizational memory
02

Architecture

metaswarm — Architecture

Repository Layout

metaswarm/
├── .claude-plugin/plugin.json   — Claude Code plugin manifest (name, version, author)
├── gemini-extension.json        — Gemini CLI extension manifest (contextFileName: GEMINI.md)
├── .codex/install.sh            — Codex CLI install script
├── hooks/
│   ├── hooks.json               — SessionStart + PreCompact → session-start.sh
│   └── session-start.sh         — Context priming (platform-aware)
├── skills/ (13 dirs)            — Orchestration skills (Agent Skills standard)
├── commands/ (15 .md + 12 .toml)— Claude Code + Gemini CLI slash commands
├── agents/ (19 .md files)       — Agent persona definitions
├── rubrics/                     — Quality review standards (code, arch, security, testing)
├── guides/                      — Development patterns
├── knowledge/                   — JSONL knowledge base schema + templates
├── templates/                   — CLAUDE.md, AGENTS.md, GEMINI.md append templates
├── lib/                         — Platform detection, sync, setup scripts
├── cli/metaswarm.js             — Cross-platform installer entry point
├── bin/                         — Shell helpers
├── .coverage-thresholds.json    — Coverage enforcement config
├── CLAUDE.md                    — Claude Code project context
├── AGENTS.md                    — Codex CLI project context
├── GEMINI.md                    — Gemini CLI project context
└── ORCHESTRATION.md             — Redirect → skills/start/SKILL.md

Orchestration Architecture

Swarm Coordinator
  └─ spawns Issue Orchestrator (per issue)
       └─ runs 9-phase workflow:
            1. Research (Researcher Agent)
            2. Plan (Architect Agent)
            3. Design Review Gate (5 parallel: PM, Architect, Designer, Security Design, CTO)
            4. Work Unit Decomposition (DoD items, file scopes, dependency graph)
            5. Orchestrated Execution (per WU: 4-phase IMPLEMENT→VALIDATE→REVIEW→COMMIT)
            6. Final Comprehensive Review (cross-unit integration)
            7. PR Creation (Release Engineer)
            8. PR Shepherd (auto-monitors to merge)
            9. Closure & Learning (self-reflect → knowledge base)

4-Phase Execution Loop (per Work Unit)

Phase 1: IMPLEMENT
  - Coder Agent spawned (Team Mode: persistent; Task Mode: fresh Task())
  - Optional: delegate to Codex CLI or Gemini CLI via external-tools skill
  - Spec contract (DoD items) is the source of truth

Phase 2: VALIDATE
  - Orchestrator runs tests ITSELF (never trusts Coder's self-report)
  - Coverage checked against .coverage-thresholds.json (100% required)
  - If failing: loop back to IMPLEMENT (max 3 iterations before escalation)

Phase 3: ADVERSARIAL REVIEW
  - ALWAYS a fresh Task() — never a persistent teammate
  - Cross-model: if writer used Claude, reviewer may use Gemini/Codex
  - Reviewer checks DoD compliance with file:line evidence
  - If failing: loop back to IMPLEMENT

Phase 4: COMMIT
  - Orchestrator writes commit message
  - Pushes to feature branch

Knowledge Base Architecture

knowledge/
  *.jsonl         — JSONL-format fact entries (one JSON object per line)
  schema.md       — Entry format: {type, content, files, keywords, work_type, date}
  templates/      — Entry templates per category

bd prime --files "src/auth/**" --keywords "auth" --work-type implementation
  → Loads only matching JSONL entries into agent context
  → Selective retrieval: hundreds of entries without context overflow

BEADS Integration

BEADS (bd CLI by Steve Yegge) provides:

  • Issue/task management in git (no external service)
  • Dependency graph between tasks
  • bd prime — selective knowledge priming by files/keywords/work-type
  • bd doctor — system health check
  • All task state in git → survives context compaction + session restarts

Multi-Tool Manifest System

Three separate context files:

  • CLAUDE.md — Claude Code project instructions (injected automatically)
  • AGENTS.md — Codex CLI project instructions
  • GEMINI.md — Gemini CLI extension context

Templates/ directory contains append variants for adding metaswarm context to existing project CLAUDE.md/AGENTS.md/GEMINI.md without overwriting.

Coverage Enforcement

// .coverage-thresholds.json
{
  "thresholds": { "lines": 100, "branches": 100, "functions": 100, "statements": 100 },
  "enforcement": {
    "command": "pnpm test:coverage",
    "blockPRCreation": true,
    "blockTaskCompletion": true
  }
}

Orchestrators MUST check coverage before marking any task complete or creating a PR.

03

Skills And Commands

metaswarm — Skills and Commands

Skills (13)

All skills use the Agent Skills standard (SKILL.md with YAML frontmatter).

Skill Description Key Mechanism
start Main entry — workflow guide + 18 agent personas; auto_activate: true Swarm Coordinator entry point
orchestrated-execution 4-phase IMPLEMENT→VALIDATE→ADVERSARIAL REVIEW→COMMIT loop Trust-nothing verification
design-review-gate 5 parallel specialist agents review design before implementation 3-iteration cap then human escalation
plan-review-gate 3 adversarial reviewers (Feasibility, Completeness, Scope) on plan All must PASS
setup Interactive project setup (BEADS, gh, Playwright prereqs) Creates CLAUDE.md, AGENTS.md, GEMINI.md
migrate Migration from npm to plugin installation Preserves knowledge base
status Diagnostic checks (bd doctor, coverage, CI health) Health report
pr-shepherd PR lifecycle automation (CI monitoring, review handling, merge) Autonomous CI monitoring
handling-pr-comments Review comment workflow (parse → implement → respond) Thread resolution
brainstorming-extension Structured feature brainstorming → design document creation Design doc → Design Review Gate
create-issue Create BEADS issue from spec or conversation JSONL task entry
external-tools Delegate to Codex CLI / Gemini CLI for impl or review Cross-model adversarial review
visual-review Playwright-based screenshot capture for web UI review Screenshot → agent analysis

Claude Code Commands (15)

Command Purpose
/start-task Start any task (auto-routes to full workflow if complex)
/start Alias for start-task
/brainstorm Structured brainstorm → design document
/prime Manually trigger BEADS knowledge priming
/create-issue Create tracked issue
/setup Interactive project setup
/update Update metaswarm to latest version
/status System health check
/self-reflect Post-PR reflection → knowledge base update
/pr-shepherd Activate PR shepherd for open PR
/handle-pr-comments Process open review comments
/review-design Trigger design review gate on design doc
/metaswarm-setup Framework-level setup (differs from project /setup)
/metaswarm-update-version Update metaswarm version
/external-tools-health Check Codex/Gemini CLI availability

Gemini CLI Commands (12 .toml files)

Command Notes
/metaswarm:start-task Gemini equivalent of /start-task
/metaswarm:brainstorm
/metaswarm:prime
/metaswarm:create-issue
/metaswarm:setup
/metaswarm:update
/metaswarm:status
/metaswarm:self-reflect
/metaswarm:pr-shepherd
/metaswarm:handle-pr-comments
/metaswarm:review-design
/metaswarm:external-tools-health

Codex CLI Commands

$start, $setup — and analogues for other commands using $ prefix.

Skill YAML Frontmatter Example

# skills/start/SKILL.md
---
name: start
description: Use when starting work on any task, when the user mentions metaswarm, or when the user wants to begin tracked development work
auto_activate: true
triggers:
  - "work on issue"
  - "start issue"
  - "start task"
  - "use metaswarm"
  - "@metaswarm"
  - "agent-ready label"
---

Rubrics

rubrics/ directory contains quality review standards:

  • Code review rubric
  • Architecture review rubric
  • Security review rubric
  • Testing/TDD rubric
  • Planning rubric
  • Adversarial spec compliance rubric

These are injected into reviewer agent contexts during Design Review Gate and Adversarial Review phases.

05

Agents And Subagents

metaswarm — Agents and Subagents

Agent Roster (19 Agents)

All defined as .md files in agents/:

Agent File Role
Issue Orchestrator issue-orchestrator.md Main coordinator per issue; runs 9-phase workflow
Swarm Coordinator swarm-coordinator-agent.md Assigns issues to worktrees, spawns Issue Orchestrators
Researcher researcher-agent.md Codebase exploration before planning
Architect architect-agent.md Implementation planning, design review
Product Manager product-manager-agent.md Use case & user benefit review (Design Review Gate)
Designer designer-agent.md UX/API design review (Design Review Gate)
Security Design security-design-agent.md Threat modeling (Design Review Gate)
CTO cto-agent.md TDD readiness & plan review (Design Review Gate)
Coder coder-agent.md TDD implementation (IMPLEMENT phase)
Code Review code-review-agent.md Internal code review post-implementation
Security Auditor security-auditor-agent.md Security review of code
Test Automator test-automator-agent.md Test coverage and quality
Release Engineer release-engineer-agent.md Safe delivery from merge to production
PR Shepherd pr-shepherd-agent.md PR lifecycle management
Knowledge Curator knowledge-curator-agent.md Knowledge base maintenance
Metrics metrics-agent.md Project metrics and reporting
Customer Service customer-service-agent.md User feedback integration
Slack Coordinator slack-coordinator-agent.md Async team coordination
SRE sre-agent.md Site reliability and production monitoring

Spawning Mechanisms

Task-based Spawning (fresh context)

Most agents are spawned as fresh Task() instances:

orchestrator → Task(subagent_type="researcher") → returns result
orchestrator → Task(subagent_type="architect")  → returns result

Rule: Adversarial Review in Phase 3 is ALWAYS a fresh Task(), never a persistent teammate. This prevents the reviewer from being "contaminated" by the implementer's context.

Team Mode (persistent context)

In Team Mode, the Coder Agent can be a persistent teammate that retains context across work units (does not re-initialize between WUs in the same session). This is more efficient for long-running feature implementation.

  • Phase 1 (IMPLEMENT): may use persistent teammate
  • Phase 3 (ADVERSARIAL REVIEW): always fresh Task()

Recursive Spawning

Swarm Coordinator
  └─ spawns Issue Orchestrator (per issue)
       └─ Issue Orchestrator spawns sub-orchestrators for complex epics
            └─ Sub-orchestrators spawn sub-sub-orchestrators (swarm of swarms)

No fixed depth limit documented. Recursive spawning enables handling of arbitrarily complex epics.

Design Review Gate: Parallel Spawn Pattern

Issue Orchestrator
  ├─ spawns Task(PM Agent)           — in parallel
  ├─ spawns Task(Architect Agent)    — in parallel
  ├─ spawns Task(Designer Agent)     — in parallel
  ├─ spawns Task(Security Design)    — in parallel
  └─ spawns Task(CTO Agent)          — in parallel
  
All 5 must PASS.
If any FAIL: loop (max 3 iterations) → then human escalation.

Plan Review Gate: Adversarial Plan Review

3 adversarial reviewers:
  - Feasibility reviewer
  - Completeness reviewer
  - Scope & Alignment reviewer

All 3 must PASS before plan presented to user.

External Tool Delegation (external-tools skill)

Agents can delegate to non-Claude models:

orchestrator → external-tools skill → runs Codex CLI subprocess → returns output
orchestrator → external-tools skill → runs Gemini CLI subprocess → returns output

Use cases:

  • Cost savings (delegate routine implementation to cheaper model)
  • Cross-model adversarial review (Claude implements → Gemini reviews, or vice versa)
  • Writer always reviewed by different model than writer (enforced pattern)

Agent Context Priming

Before every agent spawn, the knowledge base is consulted:

bd prime --files <affected_files> --keywords <task_keywords> --work-type <implementation|review|planning>

Relevant JSONL entries are injected into the agent's context before it begins work.

09

Uniqueness

metaswarm — Uniqueness

Differs From Seeds

Most similar to BMAD-METHOD (role-based multi-agent with phase-gated workflow) and agent-os (multi-role agents with structured phases). Exceeds BMAD in: recursive orchestration depth (swarms of swarms), self-improving knowledge base, cross-model adversarial review, and three-tool simultaneous support (Claude/Gemini/Codex). Exceeds agent-os in the explicit 4-phase inner execution loop with independent orchestrator validation.

Positioning

  • Only framework with genuine self-improvement: /self-reflect extracts patterns from PR reviews and writes them back to a JSONL knowledge base — not just a memory dump but structured, queryable institutional knowledge
  • Only framework with selective knowledge priming: bd prime --files --keywords --work-type means the knowledge base can scale to thousands of entries without context overflow
  • Only framework with explicit cross-model adversarial review: writer's model ≠ reviewer's model, enforced by architecture (external-tools skill delegates to Codex/Gemini)
  • Only framework with trust-nothing orchestrator verification: orchestrator runs tests ITSELF, never trusts subagent self-reports — unusual explicit design principle
  • Only framework with 100% coverage as blocking gate: .coverage-thresholds.json enforces 100% lines/branches/functions/statements before PR creation
  • BEADS integration: the only framework building on Steve Yegge's git-native issue tracking as coordination backbone (vs external GitHub Issues API or custom state files)
  • Superpowers heritage: explicitly credits Superpowers (Jesse Vincent) as foundational skills source — the only framework in batch 23 with this lineage

Cross-Batch Comparisons

  • vs do-it: both have skill-defined quality gates. do-it uses TOML agent definitions + hook scripts for risk-tier routing. metaswarm uses SKILL.md + BEADS for knowledge-driven routing. do-it has 23 agents; metaswarm has 19. Key difference: metaswarm's knowledge base is self-improving; do-it's is static.
  • vs HOTL plugin: both have human-in-the-loop gates. HOTL is tool-agnostic (5 tools); metaswarm is 3-tool. HOTL has smart routing (question vs feature); metaswarm has smart routing too. HOTL has 3 HOTL contracts upfront; metaswarm has Design Review Gate. Key difference: metaswarm has BEADS + self-learning; HOTL has .hotl/state/ resumability.
  • vs vnx-orchestration: both have multi-provider multi-model review. VNX has governance-first NDJSON ledger; metaswarm has knowledge-first JSONL base. VNX uses Python; metaswarm uses Shell. VNX has 21 agents; metaswarm has 19.
  • vs oh-my-codex-yeachan: both are large frameworks (stars-wise). Yeachan focuses on a single tool (Codex) with 46+ skills; metaswarm focuses on 3 tools with 19 agents. Yeachan has ultragoal ledger; metaswarm has BEADS knowledge base.

Observable Failure Modes

  • BEADS dependency: if bd CLI not installed or incompatible version, knowledge priming silently degrades to empty context
  • 100% coverage threshold: strict 100% requirement will block PR creation on any project where 100% is infeasible (legacy codebases, third-party integration code)
  • 1 contributor: single-maintainer risk — framework has deep integration complexity (BEADS + Playwright + gh + 3 CLIs) but 0 co-maintainers
  • 5-agent Design Review Gate: spawning 5 parallel agents for every feature increases costs significantly; no cost-aware routing to skip gate for small changes
  • Recursive orchestration depth: no documented depth limit for Swarm Coordinator → Issue Orchestrator → sub-orchestrator chains; risk of unbounded token spend on complex epics
  • External tool availability: external-tools skill assumes Codex CLI and Gemini CLI are installed; /external-tools-health check exists but tool absence during adversarial review silently falls back to same-model review

Distinctive Opinion

"Production-grade AI coding requires institutional memory, not just a smart agent. Knowledge extracted from every PR becomes the foundation for the next task. Trust nothing self-reported — verify independently, review adversarially, and write it all back to the knowledge base."

metaswarm is the only framework in this batch making an explicit bet on organizational memory as the multiplier for multi-agent quality — not smarter prompts or better models, but accumulated patterns from hundreds of past PRs.

04

Hooks And Automation

metaswarm — Hooks and Automation

hooks.json

Two Claude Code lifecycle hooks defined:

{
  "hooks": {
    "SessionStart": [
      {
        "matcher": "startup|resume|clear|compact",
        "hooks": [
          {
            "type": "command",
            "command": "${CLAUDE_PLUGIN_ROOT}/hooks/session-start.sh",
            "async": false
          }
        ]
      }
    ],
    "PreCompact": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "${CLAUDE_PLUGIN_ROOT}/hooks/session-start.sh",
            "async": false
          }
        ]
      }
    ]
  }
}

SessionStart fires on startup|resume|clear|compact triggers. PreCompact fires before every context compaction with empty matcher (always). Both call the same session-start.sh script.

session-start.sh

Platform-aware context priming script:

  • Detects which tool is active (Claude Code, Codex, Gemini)
  • Runs bd prime if BEADS is installed (primes relevant knowledge base entries into context)
  • Loads BEADS status (active issues, blocked tasks)
  • Outputs session context summary for the agent

Critical use: The PreCompact hook ensures knowledge is re-primed before the context window is compacted. This prevents the agent from losing project knowledge when context fills up.

BEADS-Driven Automation

While not hook-based, BEADS provides automation for task state:

bd prime --files "src/**" --keywords "auth" --work-type implementation
  # Selective knowledge priming — loads only matching JSONL entries

bd ready     # Show tasks ready to work
bd list      # All tasks
bd stats     # Project statistics
bd doctor    # System health check

Coverage Enforcement Automation

.coverage-thresholds.json enforces blocking gates:

{
  "enforcement": {
    "command": "pnpm test:coverage",
    "blockPRCreation": true,
    "blockTaskCompletion": true
  }
}

Agents MUST run coverage check and verify 100% before proceeding. This is instruction-enforced (in skill definitions), not tool-hook-enforced.

PR Shepherd Automation

/pr-shepherd skill provides autonomous PR lifecycle management:

  • Monitors CI status (polling gh pr checks)
  • Handles review comments automatically
  • Resolves conversation threads
  • Attempts re-runs on flaky test failures
  • Escalates to human after N failures

Self-Reflect Automation

/self-reflect (triggered after PR merge) runs the self-learning pipeline:

  1. Reads PR review comments (human + automated)
  2. Reads build/test failure logs
  3. Extracts patterns, gotchas, anti-patterns
  4. Writes structured JSONL entries to knowledge/
  5. Optionally generates proposals for new skills or updated rubrics

No PostToolUse Hooks

metaswarm does NOT use:

  • PostToolUse hooks for validation
  • PreToolUse hooks for interception
  • Stop hooks for completion verification

All quality gates are skill-driven (agent instructions), not hook-enforced. The single exception is the PreCompact hook for context priming.

Gemini CLI / Codex CLI Hooks

For Gemini CLI: gemini-extension.json points to GEMINI.md as context. No Gemini hook system. For Codex CLI: .codex/install.sh handles setup. No Codex hook definitions.

06

Workflow And Phases

metaswarm — Workflow and Phases

9-Phase Workflow

Phase 1: Research

  • Researcher Agent explores codebase
  • Reads existing architecture, relevant files, tests
  • Outputs research summary to Issue Orchestrator

Phase 2: Plan

  • Architect Agent drafts implementation plan
  • Decomposed into Work Units (WUs) with DoD items, file scopes, dependency graph
  • Plan submitted to Plan Review Gate (3 adversarial reviewers: Feasibility, Completeness, Scope)
  • All 3 must PASS before proceeding

Phase 3: Design Review Gate

For complex features (especially those from /brainstorm):

  • 5 specialist agents spawn in parallel:
    1. Architect (technical architecture)
    2. Product Manager (use case, user benefit)
    3. Designer (UX/API design)
    4. Security Design (threat modeling)
    5. CTO (TDD readiness)
  • All 5 must PASS. If any FAIL: iterate (max 3 iterations), then human escalation
  • Approval gate: HARD STOP — no code until Design Review Gate passes

Phase 4: Work Unit Decomposition

  • Issue Orchestrator creates BEADS epic
  • Decomposes plan into individual Work Units
  • Each WU has: DoD items list, file scope, dependencies, estimated complexity
  • Dependency graph determines execution order (serial vs parallel WUs)

Phase 5: Orchestrated Execution (per WU)

The 4-phase inner loop runs for each Work Unit:

IMPLEMENT

  • Coder Agent implements WU to spec (DoD items as contract)
  • Optional: delegate to Codex CLI or Gemini CLI via external-tools skill

VALIDATE

  • Orchestrator runs tests ITSELF (never trusts Coder self-report)
  • pnpm test:coverage → checks against .coverage-thresholds.json
  • Required: 100% lines, branches, functions, statements
  • Failure: return to IMPLEMENT (max 3 iterations)

ADVERSARIAL REVIEW

  • Fresh Task() spawned (never a persistent teammate)
  • Cross-model where possible (writer's model ≠ reviewer's model)
  • Reviewer checks DoD compliance with file:line evidence
  • Failure: return to IMPLEMENT

COMMIT

  • Orchestrator writes commit message
  • Pushes to feature branch

Phase 6: Final Comprehensive Review

  • Cross-unit integration review
  • Code Review Agent + Security Auditor review full set of changes
  • Checks for cross-WU integration issues not visible within single WU

Phase 7: PR Creation

  • Release Engineer Agent creates PR
  • PR description includes DoD checklist, test coverage, design doc link
  • BLOCKED if coverage below threshold

Phase 8: PR Shepherd

  • PR Shepherd Agent monitors CI
  • Handles review comments automatically
  • Resolves conversation threads
  • Auto-reruns flaky tests
  • Escalates after N failures

Phase 9: Closure & Learning

  • /self-reflect runs after merge
  • Extracts patterns, gotchas, decisions from PR review comments + test failures
  • Writes JSONL entries to knowledge/
  • May propose new skills or updated rubrics
  • BEADS epic closed, dependencies resolved

Smart Routing

/start-task auto-routes:

  • Simple task (bug fix, copy change) → direct implementation without full 9-phase workflow
  • Complex feature → full 9-phase workflow
  • Question → direct answer (no workflow overhead)

Approval Gates

  1. Plan Review Gate (Phase 2): 3-reviewer adversarial plan review — HARD STOP
  2. Design Review Gate (Phase 3): 5-parallel-reviewer design review — HARD STOP
  3. Coverage Gate (Phase 5, VALIDATE): 100% coverage required — blocks PR creation
  4. Human escalation (Phases 3+5): after 3 failed iterations in any gate

Resumability

BEADS task state persists to git. If a session is interrupted:

  • bd list shows current state
  • bd prime re-primes knowledge for context recovery
  • PreCompact hook re-primes before context window is compacted
  • Plan documents written to disk survive context loss
07

State And Memory

metaswarm — State and Memory

BEADS (bd CLI) — Primary State Backend

BEADS by Steve Yegge is the core state mechanism. All task state lives in BEADS (git-native, no external service):

bd ready          # Tasks ready to work
bd list           # All tasks
bd stats          # Project statistics
bd doctor         # System health check
bd start <id>     # Start working on a task
bd prime --files "src/**" --keywords "auth" --work-type implementation
                  # Selective knowledge priming

BEADS stores task data in git — survives context compaction, session interrupts, and tool restarts. Dependency graph between tasks is maintained in BEADS.

Knowledge Base

knowledge/ directory: JSONL-format fact store.

Entry Schema (from knowledge/schema.md)

{
  "type": "pattern|gotcha|decision|anti-pattern|lesson",
  "content": "The fact itself",
  "files": ["src/auth/**"],
  "keywords": ["authentication", "JWT"],
  "work_type": "implementation|review|planning",
  "date": "2026-05-16",
  "pr": "#123",
  "source": "pr-review|test-failure|user-correction|manual"
}

Selective Retrieval

bd prime --files "src/api/auth/**" --keywords "authentication" --work-type implementation
# Only loads JSONL entries matching these filters
# Knowledge base can grow to thousands of entries without context overflow
# Agents get the 5 critical gotchas for the files they're about to touch

Self-Improvement Loop

After every PR merge, /self-reflect writes new entries:

  1. PR review comments → patterns, anti-patterns
  2. Build/test failures → gotchas, lessons
  3. Architectural decisions → decisions, rationale
  4. User corrections → preferred approaches
  5. Repeated user instructions → skill/command proposals

Session Priming (Hooks)

SessionStart hook (startup|resume|clear|compact):

  • session-start.sh runs bd prime on startup
  • Loads active issues, blocked tasks
  • Injects current BEADS state summary

PreCompact hook (always, before context compaction):

  • Same session-start.sh
  • Critical: re-primes knowledge before the LLM's context window is compacted
  • Prevents knowledge loss during long sessions

Plan and Design Documents

Plans written to disk (not just in agent context):

  • PLAN.md — implementation plan (architect output)
  • docs/designs/ — design documents from brainstorming
  • These survive context compaction and session interrupts

Note: The Plan subagent type has read-only file access by design. Orchestrators must write plan files themselves after receiving plan text from Architect subagent.

CLAUDE.md / AGENTS.md / GEMINI.md

Three tool-specific context files in project root (populated by /setup):

  • CLAUDE.md — injected automatically by Claude Code on every session
  • AGENTS.md — Codex CLI project context
  • GEMINI.md — Gemini CLI extension context

These contain project-specific metaswarm configuration, BEADS setup status, and active workflow state.

.coverage-thresholds.json

Persisted coverage requirement — read by orchestrators before PR creation:

{ "thresholds": { "lines": 100, "branches": 100, "functions": 100, "statements": 100 } }

No Vector Store / No Embeddings

Knowledge retrieval is file-system grep + BEADS filter — no vector search, no semantic embeddings. Selective retrieval is metadata-based (files, keywords, work_type).

Git as Audit Log

All task history lives in git:

  • BEADS issues in git
  • Knowledge base JSONL in git
  • Plan docs in git
  • All commits from execution phases on feature branch
  • PR merge as closure event

No separate audit log format. Git history IS the audit trail.

08

Ui Cli Surface

metaswarm — UI & CLI Surface

metaswarm CLI (Installer)

metaswarm npm binary at cli/metaswarm.js:

npx metaswarm init

This is a cross-platform installer, not a runtime CLI. It:

  • Detects installed AI tools (Claude Code, Codex CLI, Gemini CLI)
  • Installs metaswarm for all detected tools
  • Copies context files (CLAUDE.md, AGENTS.md, GEMINI.md) to project root
  • Sets up hooks/hooks.json for Claude Code

Not used after initial setup. Subsequent updates via /update or /metaswarm-update-version commands.

Claude Code Interface

9 Claude Code slash commands:

Command Invocation
Start task /start-task or /start
Brainstorm feature /brainstorm
Prime knowledge /prime
Create issue /create-issue
Project setup /setup
Update framework /update
Status/health /status
Post-PR reflection /self-reflect
PR lifecycle /pr-shepherd
Handle comments /handle-pr-comments
Design review /review-design
External tools health /external-tools-health

Gemini CLI Interface

12 TOML-defined commands, invoked as /metaswarm:<command>:

/metaswarm:start-task
/metaswarm:brainstorm
/metaswarm:prime
/metaswarm:setup
/metaswarm:self-reflect
... (12 total)

Codex CLI Interface

$start, $setup and other $ prefixed commands.

BEADS (bd) CLI

BEADS is a prerequisite CLI, not part of metaswarm:

bd ready        # What's next
bd list         # All tasks
bd prime ...    # Knowledge priming
bd doctor       # Health check

No Web Dashboard

metaswarm has no local web server, no dashboard UI, no browser interface.

No Dedicated TUI

The framework relies on the host tool's (Claude Code / Gemini / Codex) terminal interface.

Visual Review

visual-review skill uses Playwright to capture screenshots:

# Invoked via skill, not a separate UI:
# npx playwright install chromium  (one-time setup)
# Agent runs screenshot capture and analyzes result

Not a dashboard — it's a skill-driven tool call.

Installation Methods

# 1. Claude Code plugin (recommended)
claude plugin marketplace add dsifry/metaswarm-marketplace
claude plugin install metaswarm

# 2. Gemini CLI extension
gemini extensions install https://github.com/dsifry/metaswarm.git

# 3. Codex CLI plugin
codex plugin marketplace add dsifry/metaswarm-marketplace
# Then via /plugins UI in Codex

# 4. Cross-platform installer
npx metaswarm init

Update Mechanism

# In Claude Code:
/update              # Update metaswarm
/metaswarm-update-version   # Update metaswarm version (framework-level)

# Or: re-run npx metaswarm init

Platform Detection

lib/ contains platform detection scripts used by session-start.sh and cli/metaswarm.js to identify which AI tool is active and route accordingly.

10

Install And Config

metaswarm — Install and Config

Prerequisites

  • One of: Claude Code, Gemini CLI, or Codex CLI
  • Node.js 18+
  • bd (BEADS CLI) v0.40+ — recommended for task tracking and knowledge priming
  • gh (GitHub CLI) — recommended for PR automation
  • Playwright (optional, for visual-review skill): npx playwright install chromium

Installation

claude plugin marketplace add dsifry/metaswarm-marketplace
claude plugin install metaswarm

Then run /setup in Claude Code to configure project context files.

Gemini CLI

gemini extensions install https://github.com/dsifry/metaswarm.git

Then run /metaswarm:setup in your project.

Codex CLI

codex plugin marketplace add dsifry/metaswarm-marketplace
codex
# Open /plugins → select metaswarm marketplace → install

Then run $setup.

Cross-Platform (All Tools)

npx metaswarm init

Detects installed CLIs and installs for all of them simultaneously.

Legacy: npm direct

npm install -g metaswarm

Use /migrate command to transition from npm to plugin installation.

Project Setup (/setup)

Interactive setup that creates:

project-root/
├── CLAUDE.md       — Claude Code project context (metaswarm config appended)
├── AGENTS.md       — Codex CLI project context
├── GEMINI.md       — Gemini CLI extension context
├── .coverage-thresholds.json — Coverage enforcement config (100% all metrics)
└── knowledge/      — Empty JSONL knowledge base (grows over time)

Also configures BEADS in the project if bd CLI is installed.

Configuration Files

CLAUDE.md (project root)

Contains:

  • metaswarm project instructions
  • BEADS status / active issues summary
  • Project-specific coding standards
  • Agent coordination preferences

.coverage-thresholds.json

{
  "thresholds": { "lines": 100, "branches": 100, "functions": 100, "statements": 100 },
  "enforcement": {
    "command": "pnpm test:coverage",
    "blockPRCreation": true,
    "blockTaskCompletion": true
  }
}

Modify values per-project (100% is default, not mandatory).

hooks/hooks.json (installed by Claude Code plugin)

SessionStart + PreCompact → session-start.sh (not user-edited).

knowledge/ directory

JSONL fact store, empty at install, grows via /self-reflect over time.

Update

# In Claude Code:
/update
# or:
/metaswarm-update-version

# Cross-platform:
npx metaswarm init   # Re-runs installer, updates in place

Migration from npm to Plugin

/migrate    # In Claude Code — converts npm-installed metaswarm to plugin

Preserves knowledge base and CLAUDE.md content during migration.

BEADS Setup

# Install BEADS (separate project by Steve Yegge)
# See: https://github.com/steveyegge/beads

# After install:
bd doctor       # Verify setup
bd init         # Initialize in project

metaswarm's /setup checks for BEADS and guides through setup if missing.

Gemini CLI Context

gemini-extension.json in metaswarm root:

{
  "name": "metaswarm",
  "version": "0.11.0",
  "contextFileName": "GEMINI.md"
}

Gemini CLI reads GEMINI.md from the project root as extension context.

Related frameworks

same archetype · same primary tool · same memory type

MemPalace ★ 53k

Verbatim local-first AI memory with 96.6% R@5 retrieval on LongMemEval using zero API calls — structured into a palace hierarchy…

Beads (Yegge) ★ 24k

Dolt-powered distributed graph issue tracker where AI agents track tasks with hierarchical IDs and dependency edges, claim work…

deepagents (LangChain) ★ 23k

Opinionated Python agent harness on top of LangGraph with sub-agents, filesystem, memory, and context compaction bundled in

agentmemory ★ 18k

Persistent, searchable memory for AI coding agents that captures every tool interaction, compresses it via LLM, and injects…

Open Multi-Agent ★ 6.3k

Give a natural-language goal to a coordinator agent and get a dynamically decomposed, parallelized task DAG executed by…

Basic Memory ★ 3.1k

Gives AI agents a persistent, human-readable knowledge graph of project decisions, observations, and relations stored as plain…