Skip to content
/

Claude Swarm (affaan-m)

claude-swarm · affaan-m/claude-swarm · ★ 172 · last commit 2026-02-11

Decompose complex tasks via Opus into a dependency DAG, execute waves of Haiku workers in parallel, then Opus quality-gates the combined output — mirroring real engineering team structure.

Best whenStrategic model routing (Opus for planning+review, Haiku for execution) is 3x cheaper and architecturally sound — senior engineers shouldn't execute boilerpl…
Skip ifUsing the same expensive model for both planning and execution, Running tasks without dependency awareness (causes file conflicts)
vs seeds
claude-flowin multi-agent orchestration scope, but uses the Claude Agent SDK directly (not MCP) with a dependency-DAG wave schedule…
Primitive shape 6 total
Commands 3 Subagents 3
00

Summary

Claude Swarm (affaan-m) — Summary

Claude Swarm is a Python CLI tool for multi-agent orchestration of Claude Code: Opus 4.6 decomposes a task into a dependency graph of subtasks, Haiku worker agents execute them in parallel waves with file-locking for conflict prevention, and Opus 4.6 runs a Quality Gate review over all agent outputs before reporting results. Every execution is recorded as JSONL events and replayable.

Problem it solved: Complex software tasks have internal dependencies that make naive parallelism produce conflicts — Claude Swarm builds an explicit dependency DAG, respects ordering, prevents file conflicts with pessimistic locking, enforces a hard budget cap, and adds a model-appropriate quality gate (senior architect designs, junior engineers execute, senior reviews).

Distinctive traits: (1) Strategic multi-model routing: Opus for planning + quality gate, Haiku for worker execution — by explicit design; (2) Dependency-aware wave scheduling: tasks only start when dependencies complete; (3) File conflict detection via pessimistic locking across parallel workers; (4) JSONL session recording with claude-swarm replay <id>; (5) --demo flag for animated TUI preview without API key.

Target audience: Developers working on complex refactors or feature additions where subtask ordering matters and want automated parallelism with budget enforcement and quality review.

differs_from_seeds: Most similar to claude-flow in the MCP-anchored toolserver cluster (both spawn multiple agents, both use Claude), but architecturally different: claude-swarm uses the Claude Agent SDK directly (not MCP tools) and implements its own wave-based dependency scheduler. The Opus→Haiku→Opus quality-gate model closely mirrors the "senior engineer designs and reviews, junior executes" pattern. JSONL session recording with replay is unique to this framework in the entire batch.

01

Overview

Claude Swarm (affaan-m) — Overview

Origin

Created by Affaan Mustafa for the Claude Code Hackathon (Feb 10-16, 2026). MIT license. Python package (claude-swarm on PyPI). Version 0.2.0.

Philosophy

From README:

Phase 1:   Opus 4.6 decomposes task into dependency graph
Phase 2:   Parallel agents execute subtasks with live dashboard
Phase 2.5: Opus 4.6 Quality Gate reviews all agent outputs
Phase 3:   Results summary with costs and session replay

Core belief: Real engineering teams have a senior architect who designs the plan, junior engineers who execute in parallel, and the senior reviews the combined result — AI agents should mirror this structure.

Manifesto-style quote

"This mirrors real engineering team structure: a senior architect designs the plan, junior engineers execute in parallel, and the senior reviews the combined result."

Strategic model selection

"Opus 4.6 handles the hardest reasoning task — analyzing your codebase, understanding the architecture, identifying dependencies between subtasks, and producing a parallelizable execution plan. This requires deep understanding of code relationships and optimal task splitting." "Haiku handles the parallelizable work — each agent follows focused instructions from the plan. Using Haiku here is 3x cheaper while maintaining 90% of Sonnet's capability for focused tasks."

Target users

Developers with complex multi-step tasks (refactors, feature additions) who want automated parallelism with budget control, conflict prevention, and a formal quality gate.

02

Architecture

Claude Swarm (affaan-m) — Architecture

Distribution

  • PyPI package: pip install claude-swarm
  • Python 3.11+
  • CLI binary: claude-swarm

Install

pip install claude-swarm
# or from source:
git clone https://github.com/affaan-m/claude-swarm
cd claude-swarm
pip install -e .

Source structure

claude-swarm/
├── src/claude_swarm/
│   ├── __init__.py
│   ├── cli.py          # Click CLI: main command + sessions + replay
│   ├── config.py       # YAML config loader (swarm.yaml)
│   ├── decomposer.py   # Opus 4.6 task decomposition → dependency graph
│   ├── orchestrator.py # Wave scheduler, file locking, budget enforcement
│   ├── quality_gate.py # Opus 4.6 quality review
│   ├── session.py      # JSONL session recorder
│   ├── types.py        # SwarmPlan, SwarmTask, SwarmAgent, etc.
│   ├── ui.py           # Rich terminal TUI (htop-style)
│   └── demo.py         # Animated demo without API key
├── tests/              # 44 tests
└── pyproject.toml

Dependencies

  • claude-agent-sdk>=0.1.35 — Anthropic Claude Agent SDK
  • textual>=1.0.0 — TUI framework
  • rich>=13.0.0 — Terminal formatting
  • networkx>=3.0 — Dependency graph (DAG)
  • anyio>=4.0 — Async I/O
  • click>=8.0 — CLI
  • pydantic>=2.0 — Data validation

Required runtime

  • Python 3.11+
  • ANTHROPIC_API_KEY
  • Claude API access (Opus 4.6 + Haiku)
03

Components

Claude Swarm (affaan-m) — Components

CLI commands

Command Purpose
claude-swarm TASK Run a swarm: decompose, execute, quality gate, report
claude-swarm --dry-run TASK Show plan without executing
claude-swarm --demo Animated TUI preview without API key
claude-swarm sessions List past sessions
claude-swarm replay <session-id> Replay a session's JSONL events

CLI options

Option Default Purpose
--max-agents / -n 4 Max concurrent workers
--model / -m opus Decomposition model
--budget / -b $5.00 Hard cost cap
--retry / -r 1 Max retries per task
--config / -c swarm.yaml Custom agent topology
--quality-gate/--no-quality-gate on Enable/disable Opus review
--no-ui off Disable Rich TUI

Python modules

Module Purpose
decomposer.py Opus 4.6 → dependency DAG via Claude Agent SDK
orchestrator.py Wave scheduler: reads DAG, dispatches ready tasks, enforces file locks + budget
quality_gate.py Opus 4.6 review of combined agent output → score + verdict
session.py JSONL recording of all events; enables replay
ui.py Rich htop-style TUI with agent progress, tools, costs, file conflicts
config.py Parse swarm.yaml for custom agent types
demo.py Animated simulation (no API key needed)

YAML configuration (swarm.yaml)

swarm:
  name: full-stack-review
  max_concurrent: 4
  budget_usd: 5.0
  model: opus

agents:
  security-reviewer:
    description: Reviews code for OWASP vulnerabilities
    model: opus
    tools: [Read, Grep, Glob]
    prompt: |
      Analyze the code for SQL injection, XSS, CSRF...
  tester:
    description: Writes and runs tests
    model: haiku
    tools: [Read, Write, Edit, Bash]

connections:
  - from: coder
    to: [security-reviewer, tester]
  - from: [security-reviewer, tester]
    to: reviewer
05

Prompts

Claude Swarm (affaan-m) — Prompts

Verbatim excerpt 1: decomposer.py — DECOMPOSE_SYSTEM_PROMPT

DECOMPOSE_SYSTEM_PROMPT = """You are a task decomposition expert. \
Given a complex software engineering task, break it down into \
independent subtasks that can be executed by separate Claude Code \
agents in parallel.

RULES:
1. Each subtask should be as independent as possible
2. Specify dependencies between tasks (task IDs)
3. Each task should specify which files it will modify
4. Tasks should be small enough for one agent to complete in a few minutes
5. Include a "reviewer" task at the end that depends on all implementation tasks

OUTPUT FORMAT (strict JSON):
{
  "tasks": [
    {
      "id": "task-1",
      "description": "Short description of what to do",
      "agent_type": "coder|reviewer|tester|refactorer|documenter",
      "dependencies": [],
      "files_to_modify": ["src/auth.ts", "src/middleware.ts"],
      "tools": ["Read", "Write", "Edit", "Bash", "Grep", "Glob"],
      "prompt": "Detailed instructions for the agent..."
    }
  ]
}

AGENT TYPES:
- coder: Writes new code or modifies existing code
- reviewer: Reviews code changes for quality/security
- tester: Writes and runs tests
- refactorer: Refactors existing code
- documenter: Updates documentation

IMPORTANT:
- Minimize file overlap between tasks to prevent conflicts
- If two tasks MUST edit the same file, make one depend on the other
- Keep the total number of tasks between 2 and 8
- Each task's prompt should be self-contained with all context needed"""

Prompting technique: Structured JSON output enforcement with explicit conflict-prevention rules. The decomposer is prompted to think about file overlaps explicitly, which feeds the file locking system. This is prompt-chaining — the decomposer's JSON output directly becomes the orchestrator's execution plan.


Verbatim excerpt 2: YAML config agent definition (from README)

agents:
  security-reviewer:
    description: Reviews code for OWASP vulnerabilities
    model: opus
    tools: [Read, Grep, Glob]
    prompt: |
      Analyze the code for SQL injection, XSS, CSRF,
      insecure deserialization, and other OWASP Top 10 vulnerabilities.
      Report findings with severity (Critical/High/Medium/Low) and
      specific remediation steps.

  tester:
    description: Writes and runs tests
    model: haiku
    tools: [Read, Write, Edit, Bash]
    prompt: |
      Write comprehensive tests. Ensure 80% coverage minimum.
      Run tests and fix any failures before completing.

Prompting technique: Role-specific prompt injection per agent type. Each agent gets its own system-level behavioral instructions. The connections graph then defines the dependency order.

09

Uniqueness

Claude Swarm (affaan-m) — Uniqueness & Positioning

differs_from_seeds

Claude Swarm is the only pure CLI (no desktop UI) in this batch that implements true multi-model orchestration with dependency-aware wave scheduling. Among the 11 seeds, it is closest to claude-flow (MCP-anchored multi-agent with sqlite-backed memory and hive-mind consensus), but differs architecturally: claude-swarm uses the Claude Agent SDK directly (not MCP tools), has no persistent memory store, implements its own wave-based DAG scheduler rather than a consensus protocol, and adds an explicit multi-model quality gate (Opus→Haiku→Opus) as a first-class design principle. JSONL session recording with CLI replay is unique in this entire corpus — no seed or other batch framework offers command-line session replay. The --demo flag (animated TUI without API key) is also a distinctive onboarding innovation.

Positioning

  • Hackathon project (Feb 2026) that demonstrates strategic model selection as an architectural principle
  • "Senior architect designs, junior engineers execute, senior reviews" as software-team metaphor
  • Pure Python: accessible to the Python/data-science developer community, not the Node.js/Electron crowd
  • BYOK (just need ANTHROPIC_API_KEY), no cloud platform, no subscription

Observable failure modes

  • No worktree isolation: concurrent agents edit the same directory; file locking prevents conflicts but doesn't provide rollback if locking fails
  • Quality Gate is a single Opus call, not multi-agent consensus — a single point of bias
  • Budget enforcement cancels remaining tasks without cleanup or rollback of partial changes
  • No CLAUDE.md/skills injection: agents receive only the decomposer-generated task prompt
  • Alpha-stage (v0.2.0, 172 stars): not production-proven

Inspired by

  • Claude Code Hackathon (Feb 10-16, 2026)
  • Claude Agent SDK
  • Real engineering team structure (architect/engineer/reviewer)

Cross-references

  • Built on claude-agent-sdk (Anthropic's Python SDK)
  • swarm.yaml auto-detection similar to docker-compose's convention
04

Workflow

Claude Swarm (affaan-m) — Workflow

Phases

Phase Actor Action Artifact
1. Decompose Opus 4.6 Analyzes codebase, identifies dependencies, produces execution plan SwarmPlan (JSON dependency DAG)
2. Execute (waves) Haiku workers (parallel) Wave 1: independent tasks run in parallel; Wave N: tasks whose dependencies are complete File edits, test runs, code changes
2.5. Quality Gate Opus 4.6 Reviews combined output for correctness, consistency, security Quality score (0-10) + verdict (PASS/FAIL)
3. Results CLI Report: N/N tasks completed, total cost, session ID swarm-{id} JSONL session file

Dependency scheduling

  • networkx DAG tracks task dependencies
  • Tasks are dispatched in "waves": a task starts only when all its listed dependencies have status COMPLETED
  • Tasks with no dependencies run in Wave 1 simultaneously

File conflict prevention

  • Pessimistic file locking: _file_locks: dict[str, str] maps file path → agent_id
  • If Agent B requests a file that Agent A holds, Agent B is blocked until Agent A releases
  • files_to_modify specified per task in the decomposition plan

Budget enforcement

  • max_budget_usd hard cap (default $5.00)
  • When total_cost >= max_budget_usd, _budget_exceeded = True
  • Remaining tasks are cancelled; partial results reported

Retry logic

  • Failed tasks retry up to max_retries times (default: 1 retry)

Approval gates

  • --dry-run: shows plan without executing (implicit approval gate)
  • No interactive approval; once started, the swarm runs to completion or budget exhaustion

Session replay

After execution, claude-swarm replay <session-id> replays the JSONL event log — shows what each agent did, in sequence.

06

Memory Context

Claude Swarm (affaan-m) — Memory & Context

Session recording

Every swarm execution is recorded as a JSONL file: swarm-{id} containing all agent events, tool calls, outputs, costs, and status changes. This is the primary persistence artifact.

Session replay

claude-swarm replay <session-id> streams the JSONL events to reconstruct exactly what each agent did. This makes Claude Swarm the only CLI tool in this batch with a formal replay capability.

State during execution (in-memory)

  • SwarmOrchestrator maintains:
    • agents: dict[str, SwarmAgent] — live agent state
    • completed_task_ids: set[str] — which tasks have finished
    • conflicts: list[FileConflict] — detected file conflicts
    • total_cost: float — accumulated cost
    • _file_locks: dict[str, str] — file path → agent_id
    • _retry_counts: dict[str, int] — task → attempt count
  • All in-memory; only the JSONL session file is persisted

Cross-session memory

No shared memory between swarm runs. Each swarm starts fresh. The only cross-session knowledge is what's in the actual codebase files.

Context per agent

Each worker agent receives its task description + prompt (generated by Opus decomposer) + the project's file system. There is no injected CLAUDE.md or session history.

Context compaction

Not applicable — each worker is a short-lived Claude Agent SDK call (a few turns at most).

07

Orchestration

Claude Swarm (affaan-m) — Orchestration

Multi-agent support

Yes — core feature. N worker agents (default up to 4 concurrent).

Orchestration pattern

Hierarchical with dependency-aware wave scheduling:

  • Phase 1: Opus 4.6 (planner) decomposes task → dependency DAG
  • Phase 2: Worker pool (Haiku) executes tasks in waves; ready tasks (all deps complete) run in parallel
  • Phase 2.5: Opus 4.6 (reviewer) runs quality gate on all outputs
  • This is a task-decomposition-tree pattern with parallel execution at each wave

Wave scheduling algorithm

while tasks_remaining:
    ready_tasks = [t for t in pending if all(dep in completed for dep in t.dependencies)]
    dispatch ready_tasks in parallel (up to max_concurrent)
    wait for any to complete
    update completed_task_ids
    handle file conflicts, budget check, retries

Isolation mechanism

None (in-place) — all agents operate in the same working directory. File conflicts are managed via pessimistic locking, not directory isolation.

Multi-model routing

Yes — explicit strategic model selection:

Role Model Rationale
Decomposer Opus 4.6 Best reasoning for dependency analysis
Worker agents Haiku 3x cheaper, "90% of Sonnet's capability for focused tasks"
Quality Gate Opus 4.6 Senior review of combined output

File conflict prevention

Pessimistic locking via _file_locks: dict[str, str]. If two agents need the same file simultaneously, the second agent waits. This is the only framework in this batch with explicit file-level locking.

Quality gate

Opus 4.6 reviews all agent outputs after Phase 2:

  • Score: 0-10
  • Verdict: PASS/FAIL
  • Checks: integration issues, missed edge cases, security concerns, task completeness

Budget enforcement

Hard cap: when total cost >= max_budget_usd, remaining tasks are cancelled.

Execution mode

One-shotclaude-swarm TASK runs to completion (or budget exhaustion).

Max concurrent agents

Configurable via --max-agents (default: 4).

Consensus mechanism

None formal — quality gate is a single reviewer, not a multi-agent consensus.

08

Ui Cli Surface

Claude Swarm (affaan-m) — UI & CLI Surface

CLI binary: claude-swarm

Full Python CLI built with Click.

Command Purpose
claude-swarm TASK Main execution
claude-swarm --dry-run TASK Show plan without executing
claude-swarm --demo Animated TUI without API key
claude-swarm sessions List past sessions
claude-swarm replay <id> Replay session events

Terminal UI

Built with rich library — "htop-style dashboard" showing:

  • Agent progress bars
  • Tool usage per agent
  • Real-time cost tracking ($X.XX / $Y.YY budget)
  • File conflict notifications
  • Task status (PENDING/BLOCKED/RUNNING/COMPLETED/FAILED/CANCELLED)

Status indicators:

... = PENDING
BLK = BLOCKED
RUN = RUNNING
OK  = COMPLETED
ERR = FAILED
CXL = CANCELLED

Demo mode

claude-swarm --demo runs an animated TUI simulation without an API key — shows the full workflow with fake agents for presentations or evaluation.

No web dashboard

CLI + TUI only. No Electron, no web UI, no mobile monitor.

Observability

  • JSONL session files per execution
  • claude-swarm replay <id> for post-execution review
  • Real-time TUI with cost tracking during execution
  • Per-agent and total cost reported at completion

Configuration surface

swarm.yaml (auto-detected in project root or .claude/swarm.yaml) for declarative agent topology.

Related frameworks

same archetype · same primary tool · same memory type

Claude-Flow / Ruflo ★ 55k

Eliminates single-agent context limits and sequential bottlenecks by orchestrating fault-tolerant swarms of specialized AI agents…

Hermes Agent (NousResearch) ★ 168k

Self-improving personal AI agent with closed learning loop, 7 terminal backends, and messaging gateway — not tied to any AI…

OpenCode ★ 165k

Terminal-first AI coding agent with multi-model routing, native desktop app, and a typed .opencode/ configuration system for…

OpenHands ★ 75k

Open-source AI software development platform (open-source Devin alternative) with Docker sandbox isolation, 77.6% SWE-bench…

DeerFlow ★ 70k

Long-horizon superagent that researches, codes, and creates by orchestrating parallel sub-agents with isolated contexts in Docker…

oh-my-openagent (omo) ★ 60k

Multi-provider AI agent orchestration for OpenCode: escape vendor lock-in by routing Sisyphus (Claude/Kimi/GLM) and Hephaestus…