metaswarm

metaswarm · dsifry/metaswarm · ★ 284 · last commit 2026-05-16

Production-tested multi-agent orchestration with self-improving knowledge base, recursive Swarm Coordinator architecture, and mandatory cross-model adversarial review — supporting Claude Code, Gemini CLI, and Codex CLI simultaneously.

Best whenProduction-grade AI coding requires institutional memory, not just a smart agent. Knowledge from every PR becomes the foundation for the next task; trust not…

Skip ifTrusting subagent self-reports (orchestrators must verify independently), Same model reviewing its own work (cross-model review enforced)

vs seeds

bmad-method(role-based multi-agent, quality gates, phase-gated workflow). Exceeds BMAD in recursive orchestration depth, self-impro…

Primitive shape 61 total

Commands 27 Skills 13 Subagents 19 Hooks 2

Summary

metaswarm — Summary

metaswarm by David Sifry (dsifry) is a multi-agent orchestration framework (metaswarm npm package, v0.11.0) targeting Claude Code, Gemini CLI, and Codex CLI simultaneously. With 284 stars and 1 contributor (production-extracted system), it is the most elaborate multi-tool, self-improving orchestration framework in this batch: 19 specialized agent personas, 13 orchestration skills, 9-phase workflow (Research → Plan → Design Review Gate → Work Unit Decomposition → Orchestrated Execution → Final Review → PR Creation → PR Shepherd → Closure & Learning), and a 4-phase inner execution loop (IMPLEMENT → VALIDATE → ADVERSARIAL REVIEW → COMMIT). Standout features: a parallel 5-agent Design Review Gate, recursive Swarm Coordinator → Issue Orchestrator → sub-orchestrator spawning, BEADS (bd CLI) for git-native task tracking with selective knowledge priming, a JSONL knowledge base that self-improves via /self-reflect after every PR merge, 100% coverage thresholds enforced as a blocking gate, and external tool delegation to Codex CLI and Gemini CLI for cross-model adversarial review. Closest comparison to BMAD-METHOD (role-based multi-agent with quality gates) but extends with recursive orchestration depth, self-learning knowledge base, and three-model support (Claude/Gemini/Codex) in a single install. Inspired by Superpowers (foundational skills) and BEADS (git-native task coordination).

Overview

metaswarm — Overview

Origin

Created by David Sifry (dsifry). 284 stars, 1 contributor. Active to May 2026. Package: metaswarm v0.11.0 (npm). Described as "an extraction of a production-tested agentic orchestration system, proven in the field writing production-level code with 100% test coverage, mandatory TDD, multi-reviewed spec-driven development, and SDLC best practices across hundreds of PRs."

Philosophy

From README:

"A self-improving multi-agent orchestration framework for Claude Code, Gemini CLI, and Codex CLI. Coordinate 18 specialized AI agents and 13 orchestration skills through a complete software development lifecycle, from issue to merged PR, with recursive orchestration, parallel review gates, and a git-native knowledge base."

Design principles from README:

Knowledge-Driven Development — agents prime from knowledge base before every task
Trust Nothing, Verify Everything — orchestrators validate independently; never trust subagent self-reports
Parallel Review Gates — independent specialist reviewers run concurrently, not sequentially
Recursive Orchestration — orchestrators spawn sub-orchestrators for any complexity level
Agent Ownership — each agent owns its lifecycle; orchestrator delegates, not micromanages
BEADS as Source of Truth — all task state in BEADS; agents coordinate via database, not messages
Test-First Always — TDD mandatory; coverage thresholds (100% lines/branches/functions/statements) block PR creation
Human-in-the-Loop — checkpoints at planned review points, automatic escalation after 3 failed iterations

Key Design Decisions

Multi-tool simultaneous support: Claude Code plugin + Gemini CLI extension + Codex CLI plugin, installed via npx metaswarm init
BEADS integration: git-native issue/task tracking by Steve Yegge — bd CLI as coordination backbone
Self-improving system: /self-reflect after PR merge writes patterns, gotchas, decisions back to JSONL knowledge base
Recursive orchestration: Swarm Coordinators → Issue Orchestrators → sub-orchestrators (swarm of swarms)
External tool delegation: Codex CLI and Gemini CLI as implementation or adversarial review delegates
Superpowers-inspired: explicitly acknowledges Superpowers (Jesse Vincent) as foundation for skills methodology

Supported Platforms

Platform	Install	Commands
Claude Code	Plugin marketplace	`/start-task`, `/setup`, etc.
Gemini CLI	`gemini extensions install`	`/metaswarm:start-task`, etc.
Codex CLI	Plugin marketplace	`$start`, `$setup`, etc.

Use Cases

Issue-to-PR autonomous execution with full 9-phase lifecycle
Complex feature development with spec-driven adversarial review
Parallel multi-specialist design review before any code is written
Cross-model adversarial review (Claude implements → Gemini reviews, or vice versa)
Knowledge accumulation across hundreds of PRs for organizational memory

Architecture

metaswarm — Architecture

Repository Layout

metaswarm/
├── .claude-plugin/plugin.json   — Claude Code plugin manifest (name, version, author)
├── gemini-extension.json        — Gemini CLI extension manifest (contextFileName: GEMINI.md)
├── .codex/install.sh            — Codex CLI install script
├── hooks/
│   ├── hooks.json               — SessionStart + PreCompact → session-start.sh
│   └── session-start.sh         — Context priming (platform-aware)
├── skills/ (13 dirs)            — Orchestration skills (Agent Skills standard)
├── commands/ (15 .md + 12 .toml)— Claude Code + Gemini CLI slash commands
├── agents/ (19 .md files)       — Agent persona definitions
├── rubrics/                     — Quality review standards (code, arch, security, testing)
├── guides/                      — Development patterns
├── knowledge/                   — JSONL knowledge base schema + templates
├── templates/                   — CLAUDE.md, AGENTS.md, GEMINI.md append templates
├── lib/                         — Platform detection, sync, setup scripts
├── cli/metaswarm.js             — Cross-platform installer entry point
├── bin/                         — Shell helpers
├── .coverage-thresholds.json    — Coverage enforcement config
├── CLAUDE.md                    — Claude Code project context
├── AGENTS.md                    — Codex CLI project context
├── GEMINI.md                    — Gemini CLI project context
└── ORCHESTRATION.md             — Redirect → skills/start/SKILL.md

Orchestration Architecture

Swarm Coordinator
  └─ spawns Issue Orchestrator (per issue)
       └─ runs 9-phase workflow:
            1. Research (Researcher Agent)
            2. Plan (Architect Agent)
            3. Design Review Gate (5 parallel: PM, Architect, Designer, Security Design, CTO)
            4. Work Unit Decomposition (DoD items, file scopes, dependency graph)
            5. Orchestrated Execution (per WU: 4-phase IMPLEMENT→VALIDATE→REVIEW→COMMIT)
            6. Final Comprehensive Review (cross-unit integration)
            7. PR Creation (Release Engineer)
            8. PR Shepherd (auto-monitors to merge)
            9. Closure & Learning (self-reflect → knowledge base)

4-Phase Execution Loop (per Work Unit)

Phase 1: IMPLEMENT
  - Coder Agent spawned (Team Mode: persistent; Task Mode: fresh Task())
  - Optional: delegate to Codex CLI or Gemini CLI via external-tools skill
  - Spec contract (DoD items) is the source of truth

Phase 2: VALIDATE
  - Orchestrator runs tests ITSELF (never trusts Coder's self-report)
  - Coverage checked against .coverage-thresholds.json (100% required)
  - If failing: loop back to IMPLEMENT (max 3 iterations before escalation)

Phase 3: ADVERSARIAL REVIEW
  - ALWAYS a fresh Task() — never a persistent teammate
  - Cross-model: if writer used Claude, reviewer may use Gemini/Codex
  - Reviewer checks DoD compliance with file:line evidence
  - If failing: loop back to IMPLEMENT

Phase 4: COMMIT
  - Orchestrator writes commit message
  - Pushes to feature branch

Knowledge Base Architecture

knowledge/
  *.jsonl         — JSONL-format fact entries (one JSON object per line)
  schema.md       — Entry format: {type, content, files, keywords, work_type, date}
  templates/      — Entry templates per category

bd prime --files "src/auth/**" --keywords "auth" --work-type implementation
  → Loads only matching JSONL entries into agent context
  → Selective retrieval: hundreds of entries without context overflow

BEADS Integration

BEADS (bd CLI by Steve Yegge) provides:

Issue/task management in git (no external service)
Dependency graph between tasks
bd prime — selective knowledge priming by files/keywords/work-type
bd doctor — system health check
All task state in git → survives context compaction + session restarts

Multi-Tool Manifest System

Three separate context files:

CLAUDE.md — Claude Code project instructions (injected automatically)
AGENTS.md — Codex CLI project instructions
GEMINI.md — Gemini CLI extension context

Templates/ directory contains append variants for adding metaswarm context to existing project CLAUDE.md/AGENTS.md/GEMINI.md without overwriting.

Coverage Enforcement

// .coverage-thresholds.json
{
  "thresholds": { "lines": 100, "branches": 100, "functions": 100, "statements": 100 },
  "enforcement": {
    "command": "pnpm test:coverage",
    "blockPRCreation": true,
    "blockTaskCompletion": true
  }
}

Orchestrators MUST check coverage before marking any task complete or creating a PR.

Skills And Commands

metaswarm — Skills and Commands

Skills (13)

All skills use the Agent Skills standard (SKILL.md with YAML frontmatter).

Skill	Description	Key Mechanism
`start`	Main entry — workflow guide + 18 agent personas; auto_activate: true	Swarm Coordinator entry point
`orchestrated-execution`	4-phase IMPLEMENT→VALIDATE→ADVERSARIAL REVIEW→COMMIT loop	Trust-nothing verification
`design-review-gate`	5 parallel specialist agents review design before implementation	3-iteration cap then human escalation
`plan-review-gate`	3 adversarial reviewers (Feasibility, Completeness, Scope) on plan	All must PASS
`setup`	Interactive project setup (BEADS, gh, Playwright prereqs)	Creates CLAUDE.md, AGENTS.md, GEMINI.md
`migrate`	Migration from npm to plugin installation	Preserves knowledge base
`status`	Diagnostic checks (`bd doctor`, coverage, CI health)	Health report
`pr-shepherd`	PR lifecycle automation (CI monitoring, review handling, merge)	Autonomous CI monitoring
`handling-pr-comments`	Review comment workflow (parse → implement → respond)	Thread resolution
`brainstorming-extension`	Structured feature brainstorming → design document creation	Design doc → Design Review Gate
`create-issue`	Create BEADS issue from spec or conversation	JSONL task entry
`external-tools`	Delegate to Codex CLI / Gemini CLI for impl or review	Cross-model adversarial review
`visual-review`	Playwright-based screenshot capture for web UI review	Screenshot → agent analysis

Claude Code Commands (15)

Command	Purpose
`/start-task`	Start any task (auto-routes to full workflow if complex)
`/start`	Alias for start-task
`/brainstorm`	Structured brainstorm → design document
`/prime`	Manually trigger BEADS knowledge priming
`/create-issue`	Create tracked issue
`/setup`	Interactive project setup
`/update`	Update metaswarm to latest version
`/status`	System health check
`/self-reflect`	Post-PR reflection → knowledge base update
`/pr-shepherd`	Activate PR shepherd for open PR
`/handle-pr-comments`	Process open review comments
`/review-design`	Trigger design review gate on design doc
`/metaswarm-setup`	Framework-level setup (differs from project /setup)
`/metaswarm-update-version`	Update metaswarm version
`/external-tools-health`	Check Codex/Gemini CLI availability

Gemini CLI Commands (12 .toml files)

Command	Notes
`/metaswarm:start-task`	Gemini equivalent of /start-task
`/metaswarm:brainstorm`
`/metaswarm:prime`
`/metaswarm:create-issue`
`/metaswarm:setup`
`/metaswarm:update`
`/metaswarm:status`
`/metaswarm:self-reflect`
`/metaswarm:pr-shepherd`
`/metaswarm:handle-pr-comments`
`/metaswarm:review-design`
`/metaswarm:external-tools-health`

Codex CLI Commands

$start, $setup — and analogues for other commands using $ prefix.

Skill YAML Frontmatter Example

# skills/start/SKILL.md
---
name: start
description: Use when starting work on any task, when the user mentions metaswarm, or when the user wants to begin tracked development work
auto_activate: true
triggers:
  - "work on issue"
  - "start issue"
  - "start task"
  - "use metaswarm"
  - "@metaswarm"
  - "agent-ready label"
---

Rubrics

rubrics/ directory contains quality review standards:

Code review rubric
Architecture review rubric
Security review rubric
Testing/TDD rubric
Planning rubric
Adversarial spec compliance rubric

These are injected into reviewer agent contexts during Design Review Gate and Adversarial Review phases.

Agents And Subagents

metaswarm — Agents and Subagents

Agent Roster (19 Agents)

All defined as .md files in agents/:

Agent	File	Role
Issue Orchestrator	issue-orchestrator.md	Main coordinator per issue; runs 9-phase workflow
Swarm Coordinator	swarm-coordinator-agent.md	Assigns issues to worktrees, spawns Issue Orchestrators
Researcher	researcher-agent.md	Codebase exploration before planning
Architect	architect-agent.md	Implementation planning, design review
Product Manager	product-manager-agent.md	Use case & user benefit review (Design Review Gate)
Designer	designer-agent.md	UX/API design review (Design Review Gate)
Security Design	security-design-agent.md	Threat modeling (Design Review Gate)
CTO	cto-agent.md	TDD readiness & plan review (Design Review Gate)
Coder	coder-agent.md	TDD implementation (IMPLEMENT phase)
Code Review	code-review-agent.md	Internal code review post-implementation
Security Auditor	security-auditor-agent.md	Security review of code
Test Automator	test-automator-agent.md	Test coverage and quality
Release Engineer	release-engineer-agent.md	Safe delivery from merge to production
PR Shepherd	pr-shepherd-agent.md	PR lifecycle management
Knowledge Curator	knowledge-curator-agent.md	Knowledge base maintenance
Metrics	metrics-agent.md	Project metrics and reporting
Customer Service	customer-service-agent.md	User feedback integration
Slack Coordinator	slack-coordinator-agent.md	Async team coordination
SRE	sre-agent.md	Site reliability and production monitoring

Spawning Mechanisms

Task-based Spawning (fresh context)

Most agents are spawned as fresh Task() instances:

orchestrator → Task(subagent_type="researcher") → returns result
orchestrator → Task(subagent_type="architect")  → returns result

Rule: Adversarial Review in Phase 3 is ALWAYS a fresh Task(), never a persistent teammate. This prevents the reviewer from being "contaminated" by the implementer's context.

Team Mode (persistent context)

In Team Mode, the Coder Agent can be a persistent teammate that retains context across work units (does not re-initialize between WUs in the same session). This is more efficient for long-running feature implementation.

Phase 1 (IMPLEMENT): may use persistent teammate
Phase 3 (ADVERSARIAL REVIEW): always fresh Task()

Recursive Spawning

Swarm Coordinator
  └─ spawns Issue Orchestrator (per issue)
       └─ Issue Orchestrator spawns sub-orchestrators for complex epics
            └─ Sub-orchestrators spawn sub-sub-orchestrators (swarm of swarms)

No fixed depth limit documented. Recursive spawning enables handling of arbitrarily complex epics.

Design Review Gate: Parallel Spawn Pattern

Issue Orchestrator
  ├─ spawns Task(PM Agent)           — in parallel
  ├─ spawns Task(Architect Agent)    — in parallel
  ├─ spawns Task(Designer Agent)     — in parallel
  ├─ spawns Task(Security Design)    — in parallel
  └─ spawns Task(CTO Agent)          — in parallel
  
All 5 must PASS.
If any FAIL: loop (max 3 iterations) → then human escalation.

Plan Review Gate: Adversarial Plan Review

3 adversarial reviewers:
  - Feasibility reviewer
  - Completeness reviewer
  - Scope & Alignment reviewer

All 3 must PASS before plan presented to user.

External Tool Delegation (external-tools skill)

Agents can delegate to non-Claude models:

orchestrator → external-tools skill → runs Codex CLI subprocess → returns output
orchestrator → external-tools skill → runs Gemini CLI subprocess → returns output

Use cases:

Cost savings (delegate routine implementation to cheaper model)
Cross-model adversarial review (Claude implements → Gemini reviews, or vice versa)
Writer always reviewed by different model than writer (enforced pattern)

Agent Context Priming

Before every agent spawn, the knowledge base is consulted:

bd prime --files <affected_files> --keywords <task_keywords> --work-type <implementation|review|planning>

Relevant JSONL entries are injected into the agent's context before it begins work.

Uniqueness

metaswarm — Uniqueness

Differs From Seeds

Most similar to BMAD-METHOD (role-based multi-agent with phase-gated workflow) and agent-os (multi-role agents with structured phases). Exceeds BMAD in: recursive orchestration depth (swarms of swarms), self-improving knowledge base, cross-model adversarial review, and three-tool simultaneous support (Claude/Gemini/Codex). Exceeds agent-os in the explicit 4-phase inner execution loop with independent orchestrator validation.

Positioning

Only framework with genuine self-improvement: /self-reflect extracts patterns from PR reviews and writes them back to a JSONL knowledge base — not just a memory dump but structured, queryable institutional knowledge
Only framework with selective knowledge priming: bd prime --files --keywords --work-type means the knowledge base can scale to thousands of entries without context overflow
Only framework with explicit cross-model adversarial review: writer's model ≠ reviewer's model, enforced by architecture (external-tools skill delegates to Codex/Gemini)
Only framework with trust-nothing orchestrator verification: orchestrator runs tests ITSELF, never trusts subagent self-reports — unusual explicit design principle
Only framework with 100% coverage as blocking gate: .coverage-thresholds.json enforces 100% lines/branches/functions/statements before PR creation
BEADS integration: the only framework building on Steve Yegge's git-native issue tracking as coordination backbone (vs external GitHub Issues API or custom state files)
Superpowers heritage: explicitly credits Superpowers (Jesse Vincent) as foundational skills source — the only framework in batch 23 with this lineage

Cross-Batch Comparisons

vs do-it: both have skill-defined quality gates. do-it uses TOML agent definitions + hook scripts for risk-tier routing. metaswarm uses SKILL.md + BEADS for knowledge-driven routing. do-it has 23 agents; metaswarm has 19. Key difference: metaswarm's knowledge base is self-improving; do-it's is static.
vs HOTL plugin: both have human-in-the-loop gates. HOTL is tool-agnostic (5 tools); metaswarm is 3-tool. HOTL has smart routing (question vs feature); metaswarm has smart routing too. HOTL has 3 HOTL contracts upfront; metaswarm has Design Review Gate. Key difference: metaswarm has BEADS + self-learning; HOTL has .hotl/state/ resumability.
vs vnx-orchestration: both have multi-provider multi-model review. VNX has governance-first NDJSON ledger; metaswarm has knowledge-first JSONL base. VNX uses Python; metaswarm uses Shell. VNX has 21 agents; metaswarm has 19.
vs oh-my-codex-yeachan: both are large frameworks (stars-wise). Yeachan focuses on a single tool (Codex) with 46+ skills; metaswarm focuses on 3 tools with 19 agents. Yeachan has ultragoal ledger; metaswarm has BEADS knowledge base.

Observable Failure Modes

BEADS dependency: if bd CLI not installed or incompatible version, knowledge priming silently degrades to empty context
100% coverage threshold: strict 100% requirement will block PR creation on any project where 100% is infeasible (legacy codebases, third-party integration code)
1 contributor: single-maintainer risk — framework has deep integration complexity (BEADS + Playwright + gh + 3 CLIs) but 0 co-maintainers
5-agent Design Review Gate: spawning 5 parallel agents for every feature increases costs significantly; no cost-aware routing to skip gate for small changes
Recursive orchestration depth: no documented depth limit for Swarm Coordinator → Issue Orchestrator → sub-orchestrator chains; risk of unbounded token spend on complex epics
External tool availability: external-tools skill assumes Codex CLI and Gemini CLI are installed; /external-tools-health check exists but tool absence during adversarial review silently falls back to same-model review

Distinctive Opinion

"Production-grade AI coding requires institutional memory, not just a smart agent. Knowledge extracted from every PR becomes the foundation for the next task. Trust nothing self-reported — verify independently, review adversarially, and write it all back to the knowledge base."

metaswarm is the only framework in this batch making an explicit bet on organizational memory as the multiplier for multi-agent quality — not smarter prompts or better models, but accumulated patterns from hundreds of past PRs.

Hooks And Automation

metaswarm — Hooks and Automation

hooks.json

Two Claude Code lifecycle hooks defined:

{
  "hooks": {
    "SessionStart": [
      {
        "matcher": "startup|resume|clear|compact",
        "hooks": [
          {
            "type": "command",
            "command": "${CLAUDE_PLUGIN_ROOT}/hooks/session-start.sh",
            "async": false
          }
        ]
      }
    ],
    "PreCompact": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "${CLAUDE_PLUGIN_ROOT}/hooks/session-start.sh",
            "async": false
          }
        ]
      }
    ]
  }
}

SessionStart fires on startup|resume|clear|compact triggers. PreCompact fires before every context compaction with empty matcher (always). Both call the same session-start.sh script.

session-start.sh

Platform-aware context priming script:

Detects which tool is active (Claude Code, Codex, Gemini)
Runs bd prime if BEADS is installed (primes relevant knowledge base entries into context)
Loads BEADS status (active issues, blocked tasks)
Outputs session context summary for the agent

Critical use: The PreCompact hook ensures knowledge is re-primed before the context window is compacted. This prevents the agent from losing project knowledge when context fills up.

BEADS-Driven Automation

While not hook-based, BEADS provides automation for task state:

bd prime --files "src/**" --keywords "auth" --work-type implementation
  # Selective knowledge priming — loads only matching JSONL entries

bd ready     # Show tasks ready to work
bd list      # All tasks
bd stats     # Project statistics
bd doctor    # System health check

Coverage Enforcement Automation

.coverage-thresholds.json enforces blocking gates:

{
  "enforcement": {
    "command": "pnpm test:coverage",
    "blockPRCreation": true,
    "blockTaskCompletion": true
  }
}

Agents MUST run coverage check and verify 100% before proceeding. This is instruction-enforced (in skill definitions), not tool-hook-enforced.

PR Shepherd Automation

/pr-shepherd skill provides autonomous PR lifecycle management:

Monitors CI status (polling gh pr checks)
Handles review comments automatically
Resolves conversation threads
Attempts re-runs on flaky test failures
Escalates to human after N failures

Self-Reflect Automation

/self-reflect (triggered after PR merge) runs the self-learning pipeline:

Reads PR review comments (human + automated)
Reads build/test failure logs
Extracts patterns, gotchas, anti-patterns
Writes structured JSONL entries to knowledge/
Optionally generates proposals for new skills or updated rubrics

No PostToolUse Hooks

metaswarm does NOT use:

PostToolUse hooks for validation
PreToolUse hooks for interception
Stop hooks for completion verification

All quality gates are skill-driven (agent instructions), not hook-enforced. The single exception is the PreCompact hook for context priming.

Gemini CLI / Codex CLI Hooks

For Gemini CLI: gemini-extension.json points to GEMINI.md as context. No Gemini hook system. For Codex CLI: .codex/install.sh handles setup. No Codex hook definitions.

Workflow And Phases

metaswarm — Workflow and Phases

9-Phase Workflow

Phase 1: Research

Researcher Agent explores codebase
Reads existing architecture, relevant files, tests
Outputs research summary to Issue Orchestrator

Phase 2: Plan

Architect Agent drafts implementation plan
Decomposed into Work Units (WUs) with DoD items, file scopes, dependency graph
Plan submitted to Plan Review Gate (3 adversarial reviewers: Feasibility, Completeness, Scope)
All 3 must PASS before proceeding

Phase 3: Design Review Gate

For complex features (especially those from /brainstorm):

5 specialist agents spawn in parallel:
1. Architect (technical architecture)
2. Product Manager (use case, user benefit)
3. Designer (UX/API design)
4. Security Design (threat modeling)
5. CTO (TDD readiness)
All 5 must PASS. If any FAIL: iterate (max 3 iterations), then human escalation
Approval gate: HARD STOP — no code until Design Review Gate passes

Phase 4: Work Unit Decomposition

Issue Orchestrator creates BEADS epic
Decomposes plan into individual Work Units
Each WU has: DoD items list, file scope, dependencies, estimated complexity
Dependency graph determines execution order (serial vs parallel WUs)

Phase 5: Orchestrated Execution (per WU)

The 4-phase inner loop runs for each Work Unit:

IMPLEMENT

Coder Agent implements WU to spec (DoD items as contract)
Optional: delegate to Codex CLI or Gemini CLI via external-tools skill

VALIDATE

Orchestrator runs tests ITSELF (never trusts Coder self-report)
pnpm test:coverage → checks against .coverage-thresholds.json
Required: 100% lines, branches, functions, statements
Failure: return to IMPLEMENT (max 3 iterations)

ADVERSARIAL REVIEW

Fresh Task() spawned (never a persistent teammate)
Cross-model where possible (writer's model ≠ reviewer's model)
Reviewer checks DoD compliance with file:line evidence
Failure: return to IMPLEMENT

COMMIT

Orchestrator writes commit message
Pushes to feature branch

Phase 6: Final Comprehensive Review

Cross-unit integration review
Code Review Agent + Security Auditor review full set of changes
Checks for cross-WU integration issues not visible within single WU

Phase 7: PR Creation

Release Engineer Agent creates PR
PR description includes DoD checklist, test coverage, design doc link
BLOCKED if coverage below threshold

Phase 8: PR Shepherd

PR Shepherd Agent monitors CI
Handles review comments automatically
Resolves conversation threads
Auto-reruns flaky tests
Escalates after N failures

Phase 9: Closure & Learning

/self-reflect runs after merge
Extracts patterns, gotchas, decisions from PR review comments + test failures
Writes JSONL entries to knowledge/
May propose new skills or updated rubrics
BEADS epic closed, dependencies resolved

Smart Routing

/start-task auto-routes:

Simple task (bug fix, copy change) → direct implementation without full 9-phase workflow
Complex feature → full 9-phase workflow
Question → direct answer (no workflow overhead)

Approval Gates

Plan Review Gate (Phase 2): 3-reviewer adversarial plan review — HARD STOP
Design Review Gate (Phase 3): 5-parallel-reviewer design review — HARD STOP
Coverage Gate (Phase 5, VALIDATE): 100% coverage required — blocks PR creation
Human escalation (Phases 3+5): after 3 failed iterations in any gate

Resumability

BEADS task state persists to git. If a session is interrupted:

bd list shows current state
bd prime re-primes knowledge for context recovery
PreCompact hook re-primes before context window is compacted
Plan documents written to disk survive context loss

State And Memory

metaswarm — State and Memory

BEADS (bd CLI) — Primary State Backend

BEADS by Steve Yegge is the core state mechanism. All task state lives in BEADS (git-native, no external service):

bd ready          # Tasks ready to work
bd list           # All tasks
bd stats          # Project statistics
bd doctor         # System health check
bd start <id>     # Start working on a task
bd prime --files "src/**" --keywords "auth" --work-type implementation
                  # Selective knowledge priming

BEADS stores task data in git — survives context compaction, session interrupts, and tool restarts. Dependency graph between tasks is maintained in BEADS.

Knowledge Base

knowledge/ directory: JSONL-format fact store.

Entry Schema (from knowledge/schema.md)

{
  "type": "pattern|gotcha|decision|anti-pattern|lesson",
  "content": "The fact itself",
  "files": ["src/auth/**"],
  "keywords": ["authentication", "JWT"],
  "work_type": "implementation|review|planning",
  "date": "2026-05-16",
  "pr": "#123",
  "source": "pr-review|test-failure|user-correction|manual"
}

Selective Retrieval

bd prime --files "src/api/auth/**" --keywords "authentication" --work-type implementation
# Only loads JSONL entries matching these filters
# Knowledge base can grow to thousands of entries without context overflow
# Agents get the 5 critical gotchas for the files they're about to touch

Self-Improvement Loop

After every PR merge, /self-reflect writes new entries:

PR review comments → patterns, anti-patterns
Build/test failures → gotchas, lessons
Architectural decisions → decisions, rationale
User corrections → preferred approaches
Repeated user instructions → skill/command proposals

Session Priming (Hooks)

SessionStart hook (startup|resume|clear|compact):

session-start.sh runs bd prime on startup
Loads active issues, blocked tasks
Injects current BEADS state summary

PreCompact hook (always, before context compaction):

Same session-start.sh
Critical: re-primes knowledge before the LLM's context window is compacted
Prevents knowledge loss during long sessions

Plan and Design Documents

Plans written to disk (not just in agent context):

PLAN.md — implementation plan (architect output)
docs/designs/ — design documents from brainstorming
These survive context compaction and session interrupts

Note: The Plan subagent type has read-only file access by design. Orchestrators must write plan files themselves after receiving plan text from Architect subagent.

CLAUDE.md / AGENTS.md / GEMINI.md

Three tool-specific context files in project root (populated by /setup):

CLAUDE.md — injected automatically by Claude Code on every session
AGENTS.md — Codex CLI project context
GEMINI.md — Gemini CLI extension context

These contain project-specific metaswarm configuration, BEADS setup status, and active workflow state.

.coverage-thresholds.json

Persisted coverage requirement — read by orchestrators before PR creation:

{ "thresholds": { "lines": 100, "branches": 100, "functions": 100, "statements": 100 } }

No Vector Store / No Embeddings

Knowledge retrieval is file-system grep + BEADS filter — no vector search, no semantic embeddings. Selective retrieval is metadata-based (files, keywords, work_type).

Git as Audit Log

All task history lives in git:

BEADS issues in git
Knowledge base JSONL in git
Plan docs in git
All commits from execution phases on feature branch
PR merge as closure event

No separate audit log format. Git history IS the audit trail.

Ui Cli Surface

metaswarm — UI & CLI Surface

metaswarm CLI (Installer)

metaswarm npm binary at cli/metaswarm.js:

npx metaswarm init

This is a cross-platform installer, not a runtime CLI. It:

Detects installed AI tools (Claude Code, Codex CLI, Gemini CLI)
Installs metaswarm for all detected tools
Copies context files (CLAUDE.md, AGENTS.md, GEMINI.md) to project root
Sets up hooks/hooks.json for Claude Code

Not used after initial setup. Subsequent updates via /update or /metaswarm-update-version commands.

Claude Code Interface

9 Claude Code slash commands:

Command	Invocation
Start task	`/start-task` or `/start`
Brainstorm feature	`/brainstorm`
Prime knowledge	`/prime`
Create issue	`/create-issue`
Project setup	`/setup`
Update framework	`/update`
Status/health	`/status`
Post-PR reflection	`/self-reflect`
PR lifecycle	`/pr-shepherd`
Handle comments	`/handle-pr-comments`
Design review	`/review-design`
External tools health	`/external-tools-health`

Gemini CLI Interface

12 TOML-defined commands, invoked as /metaswarm:<command>:

/metaswarm:start-task
/metaswarm:brainstorm
/metaswarm:prime
/metaswarm:setup
/metaswarm:self-reflect
... (12 total)

Codex CLI Interface

$start, $setup and other $ prefixed commands.

BEADS (`bd`) CLI

BEADS is a prerequisite CLI, not part of metaswarm:

bd ready        # What's next
bd list         # All tasks
bd prime ...    # Knowledge priming
bd doctor       # Health check

No Web Dashboard

metaswarm has no local web server, no dashboard UI, no browser interface.

No Dedicated TUI

The framework relies on the host tool's (Claude Code / Gemini / Codex) terminal interface.

Visual Review

visual-review skill uses Playwright to capture screenshots:

# Invoked via skill, not a separate UI:
# npx playwright install chromium  (one-time setup)
# Agent runs screenshot capture and analyzes result

Not a dashboard — it's a skill-driven tool call.

Installation Methods

# 1. Claude Code plugin (recommended)
claude plugin marketplace add dsifry/metaswarm-marketplace
claude plugin install metaswarm

# 2. Gemini CLI extension
gemini extensions install https://github.com/dsifry/metaswarm.git

# 3. Codex CLI plugin
codex plugin marketplace add dsifry/metaswarm-marketplace
# Then via /plugins UI in Codex

# 4. Cross-platform installer
npx metaswarm init

Update Mechanism

# In Claude Code:
/update              # Update metaswarm
/metaswarm-update-version   # Update metaswarm version (framework-level)

# Or: re-run npx metaswarm init

Platform Detection

lib/ contains platform detection scripts used by session-start.sh and cli/metaswarm.js to identify which AI tool is active and route accordingly.

Install And Config

metaswarm — Install and Config

Prerequisites

One of: Claude Code, Gemini CLI, or Codex CLI
Node.js 18+
bd (BEADS CLI) v0.40+ — recommended for task tracking and knowledge priming
gh (GitHub CLI) — recommended for PR automation
Playwright (optional, for visual-review skill): npx playwright install chromium

Installation

Claude Code (Recommended)

claude plugin marketplace add dsifry/metaswarm-marketplace
claude plugin install metaswarm

Then run /setup in Claude Code to configure project context files.

Gemini CLI

gemini extensions install https://github.com/dsifry/metaswarm.git

Then run /metaswarm:setup in your project.

Codex CLI

codex plugin marketplace add dsifry/metaswarm-marketplace
codex
# Open /plugins → select metaswarm marketplace → install

Then run $setup.

Cross-Platform (All Tools)

npx metaswarm init

Detects installed CLIs and installs for all of them simultaneously.

Legacy: npm direct

npm install -g metaswarm

Use /migrate command to transition from npm to plugin installation.

Project Setup (`/setup`)

Interactive setup that creates:

project-root/
├── CLAUDE.md       — Claude Code project context (metaswarm config appended)
├── AGENTS.md       — Codex CLI project context
├── GEMINI.md       — Gemini CLI extension context
├── .coverage-thresholds.json — Coverage enforcement config (100% all metrics)
└── knowledge/      — Empty JSONL knowledge base (grows over time)

Also configures BEADS in the project if bd CLI is installed.

Configuration Files

CLAUDE.md (project root)

Contains:

metaswarm project instructions
BEADS status / active issues summary
Project-specific coding standards
Agent coordination preferences

.coverage-thresholds.json

{
  "thresholds": { "lines": 100, "branches": 100, "functions": 100, "statements": 100 },
  "enforcement": {
    "command": "pnpm test:coverage",
    "blockPRCreation": true,
    "blockTaskCompletion": true
  }
}

Modify values per-project (100% is default, not mandatory).

hooks/hooks.json (installed by Claude Code plugin)

SessionStart + PreCompact → session-start.sh (not user-edited).

knowledge/ directory

JSONL fact store, empty at install, grows via /self-reflect over time.

Update

# In Claude Code:
/update
# or:
/metaswarm-update-version

# Cross-platform:
npx metaswarm init   # Re-runs installer, updates in place

Migration from npm to Plugin

/migrate    # In Claude Code — converts npm-installed metaswarm to plugin

Preserves knowledge base and CLAUDE.md content during migration.

BEADS Setup

# Install BEADS (separate project by Steve Yegge)
# See: https://github.com/steveyegge/beads

# After install:
bd doctor       # Verify setup
bd init         # Initialize in project

metaswarm's /setup checks for BEADS and guides through setup if missing.

Gemini CLI Context

gemini-extension.json in metaswarm root:

{
  "name": "metaswarm",
  "version": "0.11.0",
  "contextFileName": "GEMINI.md"
}

Gemini CLI reads GEMINI.md from the project root as extension context.

Related frameworks

same archetype · same primary tool · same memory type

MemPalace ★ 53k

A10 Memory engine

Verbatim local-first AI memory with 96.6% R@5 retrieval on LongMemEval using zero API calls — structured into a palace hierarchy…

Beads (Yegge) ★ 24k

A10 Memory engine

Dolt-powered distributed graph issue tracker where AI agents track tasks with hierarchical IDs and dependency edges, claim work…

deepagents (LangChain) ★ 23k

A10 Memory engine

Opinionated Python agent harness on top of LangGraph with sub-agents, filesystem, memory, and context compaction bundled in

agentmemory ★ 18k

A10 Memory engine

Persistent, searchable memory for AI coding agents that captures every tool interaction, compresses it via LLM, and injects…

Open Multi-Agent ★ 6.3k

A10 Memory engine

Give a natural-language goal to a coordinator agent and get a dynamically decomposed, parallelized task DAG executed by…

Basic Memory ★ 3.1k

A10 Memory engine

Gives AI agents a persistent, human-readable knowledge graph of project decisions, observations, and relations stored as plain…

Distribution

Type: claude-plugin
License: MIT
Install: one-liner
Version: v0.11.0

Surfaces

CLI binary: metaswarm
CLI subcmds: 1
Local UI: No
Tech stack: null

Components

Commands: 27
Skills: 13
Subagents: 19
Hooks: 2
MCP servers: 0
MCP tools: 0
Scripts: 3
Templates: 6

Workflow

Phases: 9
Approval gates: 4
Spec format: markdown
Spec storage: per-feature-folder
Delta or full: whole-file

Orchestration

Multi-agent: Yes
Pattern: hierarchical
Isolation: git-worktree
Consensus: parallel-specialist-review
Prompt chaining: Yes

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text

Execution

Mode: interactive-loop
Crash recovery: Yes
Compaction: Yes
Session handoff: Yes
Streaming: No

Memory

Type: file-based
Persistence: project
Search: metadata-filter
State files: 4 files

Quality

TDD: Yes
TDD mechanism: coverage-threshold-blocking
Validators: 4
Self-review: cross-model-review

Git / Observability

Auto commit: Yes
Auto PR: Yes
Auto merge: No
Worktree/feat: Yes
Audit log: Yes
Audit format: jsonl
Replay: No

Tools

Primary: claude-code
Targets: 3
Portability: high

Signals

Stars: 284
Last commit: 2026-05-16
Contributors: 1
Maintainer: active
Quality score: 9.1/10