Skip to content
/

claude-codepro (Pilot Shell)

claude-codepro · maxritter/claude-codepro · ★ 1.7k · last commit 2026-05-26

Adds spec-driven planning, enforced TDD, persistent memory, multi-agent review, and a local web dashboard to Claude Code — making it production-grade for engineering teams.

Best whenEvery component of an AI development harness should solve a real, documented problem — not add architectural complexity for its own sake.
Skip ifUsing Claude Code's built-in plan mode (use /spec instead), Frameworks that add complexity without measurably better output
vs seeds
superpowersin behavioral philosophy (TDD enforcement, worktree isolation, skill-based orchestration) but exceeds it in every dimens…
Primitive shape 40 total
Skills 16 Subagents 5 Hooks 12 MCP tools 7
00

Summary

claude-codepro (Pilot Shell) — Summary

Pilot Shell (marketed as claude-codepro / pilot-shell) is a comprehensive, production-grade harness for Claude Code built by maxritter that provides spec-driven development (/spec), enforced TDD, a local web dashboard (Console), persistent memory, 7 MCP servers, multi-agent subagents (spec-review, changes-review, web-search), Python hooks for quality gates and context management, and a pilot bot daemon for 24/7 scheduled automation. It requires a paid license (available at polar.sh) for the pilot CLI binary; the CLI source is git-crypt encrypted in the public repo.

Problem it solves: Claude Code out-of-the-box skips tests, loses context across sessions, and produces inconsistent results without structured planning. Pilot Shell adds spec-driven planning, enforced quality gates (lint/format/type-check/test auto-run on every file write), persistent memory, semantic code search, and a collaborative spec review workflow.

Distinctive trait: The Console — a local web dashboard that provides real-time notifications, spec management, session management, and a collaborative spec-review feature where teammates annotate specs at pilot-shell.com/s/<id> and feedback syncs back automatically. This cloud-assisted local workflow is unique in the corpus.

Target audience: Professional development teams using Claude Code who need production-grade quality automation, team collaboration on specs, and 24/7 background automation (pilot bot). Requires Claude Max 5x/20x subscription plus a Pilot Shell license.

Differs from seeds: Most similar to superpowers (comprehensive behavioral framework for Claude Code with skills, hooks, TDD enforcement, worktree isolation) but substantially more complex: Pilot Shell adds 7 MCP servers, Python hooks with spec lifecycle awareness, a local web dashboard, a pilot bot daemon mode, semantic code search (Semble), knowledge graph (CodeGraph), RTK token compression, and a licensed CLI binary — features superpowers explicitly avoids by design. Closer to a full platform than a framework.

01

Overview

claude-codepro (Pilot Shell) — Overview

Origin

Published by maxritter as claude-codepro on GitHub (repo references pilot-shell throughout). 1728 stars, 144 forks, 9 contributors. License: Other (not standard OSS — repo contains a licensing mechanism via polar.sh). Last commit: 2026-05-26. Language: TypeScript (console), Python (hooks), JavaScript (scripts).

The repo is the public face of a commercial product at pilot-shell.com. The CLI source (launcher/) is git-crypt encrypted and not publicly readable. The framework components (skills, hooks, agents, rules, MCP config) are public.

Philosophy

"Claude Code writes code fast — but without structure, it skips tests, loses context, and produces inconsistent results. Other frameworks add complexity (dozens of agents, thousands of lines of config) without meaningfully better output."

"Pilot Shell is different. Every component solves a real problem with an engineered solution."

"Run pilot for Spec-Driven Development with /spec, or pilot bot for 24/7 automations."

Key Architecture Principles

  1. Spec-Driven Development/spec replaces Claude Code's built-in plan mode; every feature goes through Discuss → Plan → Approve → Implement (TDD) → Verify loop
  2. Quality hooks on every file write — PostToolUse hooks on Write/Edit/MultiEdit auto-run file_checker.py (lint, format, type-check); no file is saved without automatic validation
  3. Context preservation — PreCompact hook saves state; PostCompact hook restores it; context_monitor.py tracks window usage across all tool calls
  4. Worktree isolation — each /spec creates an isolated git worktree, implements in it, squash-merges on success
  5. Multi-model — Codex plugin provides adversarial review via OpenAI Codex during spec planning and verification (opt-in)
  6. 24/7 automationpilot bot is a persistent background agent that executes scheduled jobs (health checks, deployments, monitoring) with heartbeat monitoring

Licensing

Requires a paid license key (pilot activate <key>) available at polar.sh/max-ritter/portal. The installer is free; the pilot binary enforces license validation. No free tier mentioned.

02

Architecture

claude-codepro (Pilot Shell) — Architecture

Distribution

  • Type: CLI tool (bash installer → ~/.pilot/ + ~/.claude/)
  • License: Proprietary (requires license key)
  • Language: TypeScript (console), Python 3.12+ (hooks), JavaScript/CJS (scripts)

Install Method

curl -fsSL https://raw.githubusercontent.com/maxritter/pilot-shell/main/install.sh | bash

7-step installer: checks/installs prerequisites (Homebrew, Node.js, Python 3.12+, uv, git, jq), installs into ~/.claude/ and ~/.pilot/, configures shell aliases.

# After install
pilot          # Launch spec-driven development session
pilot activate <license-key>
pilot bot      # Launch 24/7 automation daemon

Directory Tree (pilot/ subdirectory — the public framework layer)

pilot/
├── .mcp.json          # 7 MCP server configurations
├── agents/            # 5 subagent definitions
│   ├── spec-review.md
│   ├── spec-review-codex.md
│   ├── changes-review.md
│   ├── changes-review-codex.md
│   └── web-search-agent.md
├── hooks/             # 17 Python hook scripts + hooks.json
│   ├── hooks.json
│   ├── context_monitor.py
│   ├── file_checker.py
│   ├── pre_compact.py
│   ├── post_compact_restore.py
│   ├── spec_handoff_resume.py
│   ├── spec_mode_guard.py
│   ├── spec_plan_validator.py
│   ├── spec_stop_guard.py
│   ├── tool_redirect.py
│   ├── tool_token_saver.py
│   ├── session_clear.py
│   └── session_end.py
├── skills/            # 16 skills (multi-step orchestrators)
│   ├── spec-plan/
│   ├── spec-implement/
│   ├── spec-verify/
│   ├── spec-bugfix-plan/
│   ├── spec-bugfix-verify/
│   ├── fix/
│   ├── prd/
│   ├── setup-rules/
│   ├── create-skill/
│   ├── benchmark/
│   └── bot-{boot,channel-task,defaults,heartbeat,jobs}/
├── rules/             # 14 rule files (.md)
│   ├── browser-automation.md
│   ├── cli-tools.md
│   ├── code-review-reception.md
│   ├── development-practices.md
│   ├── documentation-sync.md
│   ├── mcp-servers.md
│   ├── standards-{backend,frontend,golang,python,typescript}.md
│   ├── task-and-workflow.md
│   ├── testing.md
│   └── verification.md
├── scripts/           # 4 Node.js CJS worker scripts
│   ├── worker-service.cjs
│   ├── mcp-server.cjs
│   ├── context-generator.cjs
│   └── worker-wrapper.cjs
├── settings.json      # Claude Code settings
└── claude.json

Required Runtime

  • Claude Code (native installer version)
  • Node.js >= 18
  • Python >= 3.12 + uv
  • Bun (for scripts/worker-service.cjs)
  • git, jq
  • Paid license key (polar.sh)

Console (Local Web Dashboard)

Built with: Vite + vanilla JS (pre-compiled bundles in pilot/ui/)

Access: Local — exact port not documented in public README (Console accessible from pilot command)

Features: spec management, real-time notifications, session management, annotation panel, Extensions view, plan viewer with diff

03

Components

claude-codepro (Pilot Shell) — Components

Skills (16)

All skills use multi-step orchestrator format with manifest.json + orchestrator.md + steps/ directory.

Skill Purpose
spec-plan Phase 1 of /spec: codebase exploration, plan creation, spec-review subagent, user approval gate
spec-implement Phase 2 of /spec: git worktree creation, TDD implementation, quality hooks
spec-verify Phase 3 of /spec: full test suite + browser E2E + unified review subagent + squash merge
spec-bugfix-plan /fix planning phase: investigate bug, write failing test
spec-bugfix-verify /fix verify phase: Behavior Contract audit
fix One-shot bugfix workflow orchestrator
prd /prd: brainstorm → PRD generation, saves to docs/prd/
setup-rules Explore codebase, discover conventions, generate rules and MCP config
create-skill Build a reusable skill from a topic interactively
benchmark Prompt benchmarking with falsifiable assertions and improvement plan
bot-boot Initialize Pilot Bot daemon
bot-channel-task Bot task execution (individual task)
bot-defaults Bot default configuration
bot-heartbeat Bot heartbeat monitoring
bot-jobs Bot scheduled job definitions

Subagents (5)

Agent Purpose Model
spec-review Verify plan alignment with user requirements, adversarial risk check. Returns structured JSON. sonnet, background: true
spec-review-codex Codex-powered adversarial spec review (OpenAI). Opt-in. unknown
changes-review Unified code review after implementation (compliance + quality + goal). unknown
changes-review-codex Codex-powered adversarial changes review (opt-in). unknown
web-search-agent Web search capability for research during spec planning. unknown

Hooks (12 events across 17 scripts)

Event Scripts
SessionStart worker-service (context inject), worker-service (user-message async), session_clear.py (on clear), post_compact_restore.py (on compact)
UserPromptSubmit spec_mode_guard.py, spec_handoff_resume.py, worker-service (session-init async)
PreToolUse tool_redirect.py (Bash/WebSearch/WebFetch/Grep/Glob/Agent), tool_token_saver.py (Bash)
PostToolUse file_checker.py (Write/Edit/MultiEdit), context_monitor.py (all tool types), worker-service observation (async, *)
Stop spec_stop_guard.py, worker-service summarize (async)
SessionEnd session_end.py
PreCompact pre_compact.py

Rules (14)

Modular rule files activated by file type or always-on: browser-automation, cli-tools, code-review-reception, development-practices, documentation-sync, mcp-servers, standards-backend, standards-frontend, standards-golang, standards-python, standards-typescript, task-and-workflow, testing, verification

MCP Servers (7)

Server Purpose
context7 @upstash/context7-mcp@2.2.4 — library documentation lookup
codegraph codegraph serve --mcp — code knowledge graph
mem-search ~/.pilot/scripts/mcp-server.cjs — persistent memory search
web-search open-websearch@2.1.9 — DuckDuckGo/Bing/Exa
grep-mcp https://mcp.grep.app — remote grep
web-fetch fetcher-mcp@0.3.9 — web content fetching
semble uvx --from semble[mcp] semble — semantic code search

Scripts (4)

Script Purpose
worker-service.cjs Main background worker: context injection, session init, observation recording, summarization
mcp-server.cjs Local MCP server for persistent memory
context-generator.cjs Context generation for sessions
worker-wrapper.cjs Worker process wrapper
05

Prompts

claude-codepro (Pilot Shell) — Prompt Excerpts

Excerpt 1: spec-review subagent — Adversarial Plan Reviewer

---
name: spec-review
description: Spec review agent that verifies alignment with user requirements and
  challenges dangerous assumptions. Returns structured JSON findings.
tools: Read, Grep, Glob, Write
model: sonnet
background: true
permissionMode: plan
---

# Spec Review

## ⛔ Performance Budget

**Hard limit: ≤ 7 tool calls total** (excluding the final Write). Pattern: Read plan (1)
→ 2-4 targeted Grep calls for riskiest assumptions → Write output (1). Do NOT read every
file mentioned in the plan. Flag unverifiable claims as `untested_assumption` rather than
spending tool calls.

**⛔ MANDATORY: Write output.** Your LAST action MUST be `Write` to `output_path`.
At 5+ tool calls without writing → STOP exploring, write what you have.

## Output Format

Output ONLY valid JSON (no markdown wrapper):

{
  "plan_file": "<path>",
  "review_summary": "1-2 sentence summary",
  "alignment_score": "high | medium | low",
  "risk_level": "high | medium | low",
  "issues": [
    {
      "severity": "must_fix | should_fix | suggestion",
      "category": "requirement_coverage | scope_alignment | ...",
      "title": "Brief title",
      "description": "What's wrong and why it matters",
      "suggested_fix": "Specific fix"
    }
  ]
}

Prompting technique: Hard budget enforcement + structured JSON output contract. The "⛔ MANDATORY: Write output" and "At 5+ tool calls without writing → STOP" pattern prevents the subagent from over-exploring and stalling the orchestrator. The structured JSON output enables the orchestrator to programmatically parse review findings.


Excerpt 2: spec-plan orchestrator — Phase 1 Constraints

---
name: spec-plan
description: "Spec planning phase - explore codebase, design plan, get approval"
argument-hint: "<task description> or <path/to/plan.md>"
user-invocable: false
hooks:
  Stop:
    - command: uv run --no-project python "$HOME/.claude/hooks/spec_plan_validator.py"
---

# /spec-plan - Planning Phase

## ⛔ Critical Constraints

- **NO sub-agents during planning** except Step 10 (spec-review, when enabled)
- **NEVER write code during planning** — planning and implementation are separate phases
- **NEVER assume — verify by reading files**
- **ONLY stopping point is plan approval** — everything else is automatic.
  Never ask "Should I fix these?"
- **Re-read plan after user edits** — before asking for approval again
- **Plan file is source of truth** — survives across auto-compaction cycles
- **Quality over speed** — never rush due to context pressure

Prompting technique: Phase boundary enforcement via constraint list. The "ONLY stopping point is plan approval" instruction prevents spurious mid-task approval requests. The hooks.Stop field attaches spec_plan_validator.py to the skill's own Stop event — so even if the skill ends, the validator runs to check plan completeness before the orchestrator proceeds.

09

Uniqueness

claude-codepro (Pilot Shell) — Uniqueness & Positioning

Differs From Seeds

Most similar to superpowers in ambition (comprehensive behavioral framework for Claude Code with TDD, worktree isolation, and subagent review) but substantially more complex in every dimension: Pilot Shell adds 7 bundled MCP servers (vs superpowers' 0), 12 Python lifecycle hooks (vs superpowers' 1 SessionStart hook), a local web dashboard with cloud-assisted spec collaboration (unique in corpus), a pilot bot 24/7 daemon mode, semantic code search (Semble), knowledge graph (CodeGraph), RTK token compression, and a paid license. Unlike spec-kit (9 commands × 9 skills × 18 hooks, closest to parity on hook count), Pilot Shell's hooks are Python scripts rather than simple shell commands, enabling rich state tracking. Unlike claude-flow (MCP-anchored, 305 tools), Pilot Shell's MCP servers are externally-sourced utilities, not a monolithic tool server.

The Kiro Comparison

Of all seeds, Pilot Shell has the most in common with kiro (full IDE integration with spec lifecycle, requirements.md/design.md/tasks.md artifacts, approval gates) — but Pilot Shell is Claude Code native rather than an IDE fork. The cloud spec-collaboration feature (annotate at pilot-shell.com/s/<id>) is the closest functional analogue to Kiro's built-in collaboration UX.

CLAUDE.md vs AGENTS.md Schism — Pilot Shell Position

Pilot Shell ships alongside AGENTS.md support (via setup-rules which can re-export to AGENTS.md). It is Claude Code-first but acknowledges Codex via the Codex plugin integration and optional Codex adversarial review.

Observable Failure Modes

  • Proprietary license is a hard dependency — if polar.sh goes offline or pricing changes, adoption risk
  • git-crypt encrypted launcher means community cannot audit or contribute to the CLI binary
  • 7 MCP servers, Python hooks, Bun, uv all need to be installed and working — complex setup failure surface
  • Cloud spec review (pilot-shell.com/s/<id>) introduces a cloud dependency for a collaboration feature
  • Bot mode complexity — scheduled jobs, heartbeat monitoring, Telegram integration add surface area

Market Positioning

"Other frameworks add complexity (dozens of agents, thousands of lines of config) without meaningfully better output. Pilot Shell is different. Every component solves a real problem with an engineered solution."

Positioned against frameworks perceived as over-engineered (likely claude-flow) while being one of the most feature-rich frameworks in the corpus.

04

Workflow

claude-codepro (Pilot Shell) — Workflow

/spec Workflow (Primary)

Discuss  →  Plan  →  Approve  →  Implement (TDD)  →  Verify  →  Done
                                                        ↑         ↓
                                                        └── Loop──┘

Phase Details

Phase Skill Artifact
Plan spec-plan docs/plans/YYYY-MM-DD-<slug>.md
(optional) Spec Review spec-review subagent JSON findings at output_path
(optional) Codex adversarial spec-review-codex subagent JSON findings
Implement spec-implement Code in git worktree + full test suite
Verify spec-verify E2E test results, review subagent findings, squash merge

Approval Gates

Gate Type When
Plan approval yes-no (user) After spec-plan produces plan file
Codex review enable choice Before spec-plan (Console Settings)
Worktree merge confirm yes-no After spec-verify passes

/fix Workflow (Bugfix)

  1. Investigate bug, write failing test (RED)
  2. Fix at root cause
  3. Single-pass audit via changes-review subagent
  4. Done — no separate verify phase, no plan file

/prd Workflow

  1. Vague problem → clarifying questions → PRD document
  2. Saved to docs/prd/
  3. Reviewable in Console's Requirements tab
  4. Feed to /spec when ready

/setup-rules

  1. Explore codebase, detect conventions
  2. Generate modular rule files
  3. Auto-configure MCP servers
  4. Optional: re-export to AGENTS.md (user consent required)

Quick Mode (default)

No plan, no approval gate. Quality hooks and TDD still enforce. For small tasks and exploration.

Bot Mode (pilot bot)

Persistent background daemon. Scheduled jobs defined in bot-jobs skill. Heartbeat monitoring. Telegram plugin for mobile task dispatch (opt-in).

State Files

File Purpose
docs/plans/YYYY-MM-DD-<slug>.md Spec plan (source of truth across compaction)
docs/prd/ PRD documents
.annotations/<spec>.json Teammate annotation feedback
~/.pilot/ Global Pilot state
06

Memory Context

claude-codepro (Pilot Shell) — Memory & Context

State Storage

Multi-layer memory system:

Layer Mechanism Persistence
Plan files docs/plans/YYYY-MM-DD-<slug>.md Project, survives compaction
PRD documents docs/prd/ Project
Session learnings worker-service.cjs (background) Global (~/.pilot/)
Spec annotations .annotations/<spec>.json Project
Worker state ~/.pilot/ Global

Context Window Management

context_monitor.py runs on every tool use (PostToolUse hook matching all tools) and tracks context window usage. When approaching limits, it triggers managed state transitions.

pre_compact.py saves critical context before Claude's auto-compaction. post_compact_restore.py restores saved context after compaction. spec_handoff_resume.py detects spec-in-progress state at session start and resumes from the plan file.

Cross-Session Handoff

Plan file (docs/plans/YYYY-MM-DD-<slug>.md) is the source of truth. The spec_handoff_resume.py hook runs on UserPromptSubmit and detects if a spec was in progress, injecting the plan context back automatically.

Semantic Memory

  • Semble (MCP server): semantic code search across the codebase — "60-90% cost reduction" via query-based lookup instead of reading large files
  • CodeGraph (MCP server): code knowledge graph for dependency and relationship queries
  • mem-search (local MCP server): persistent memory search via mcp-server.cjs

RTK Token Compression

tool_token_saver.py (PreToolUse hook on Bash) implements RTK compression for Bash command output — reduces token cost of large tool responses.

07

Orchestration

claude-codepro (Pilot Shell) — Orchestration

Multi-Agent

Yes. Pilot Shell uses 5 named subagents with explicit capability constraints.

Orchestration Pattern

Hierarchical: the main agent (running /spec-plan, /spec-implement, /spec-verify) orchestrates specialist subagents at defined checkpoints:

  • spec-review invoked by spec-plan when enabled (plan → review → plan update loop)
  • changes-review invoked by spec-verify (implementation → review → auto-fix loop)
  • Codex subagents (spec-review-codex, changes-review-codex) are opt-in adversarial reviewers using OpenAI Codex

Isolation Mechanism

Git worktree per feature: spec-implement creates an isolated git worktree, implements there, and squash-merges to main on spec-verify success.

Multi-Model

Yes. Pilot Shell is the only framework in this batch that routes different models to different roles:

  • Main agent: Claude (model configured by Claude Code, typically sonnet)
  • Spec-review / changes-review subagents: model: sonnet (specified in agent frontmatter)
  • Codex subagents: OpenAI Codex (via ChatGPT Plus subscription, opt-in)

Model Role Mapping

Role Model
Main planning/implementation Claude (user's subscription model)
Spec review claude-sonnet
Changes review unknown (not specified in public agents)
Adversarial review OpenAI Codex (opt-in)

Execution Modes

Two modes:

  1. Interactive (pilot): Spec-driven development with user approval gates
  2. Background daemon (pilot bot): 24/7 automation, scheduled jobs, heartbeat monitoring

Consensus

None. The Codex subagent provides an independent second opinion but there is no formal consensus mechanism — the main agent decides whether to act on review findings.

Prompt Chaining

Yes. The spec workflow is a classic prompt chain:

  • spec-plan output (plan file) → spec-implement input
  • spec-review output (JSON) → spec-plan amendment
  • spec-implement output (plan with implementation notes) → spec-verify input

Browser Automation

Pilot Shell auto-detects and routes browser automation:

  1. Claude Code Chrome extension (preferred)
  2. Chrome DevTools MCP (fallback)
  3. playwright-cli (fallback)
  4. agent-browser (fallback)
08

Ui Cli Surface

claude-codepro (Pilot Shell) — UI / CLI Surface

CLI Binary

Yes — pilot (and ccp alias).

  • Name: pilot
  • Type: Own runtime (Python launcher, git-crypt encrypted source)
  • Thin wrapper: No — it is a full runtime, not a thin wrapper over claude CLI
  • Subcommands:
    • pilot — launch interactive spec-driven Claude Code session
    • pilot bot — launch 24/7 background automation daemon
    • pilot activate <key> — license activation
    • pilot --version — version

Local Web Dashboard (Console)

Yes — a local web dashboard.

  • Type: Web dashboard
  • Tech stack: Vite + vanilla JS (pre-compiled; UI bundles in pilot/ui/)
  • Port: Not documented publicly (accessible via pilot command)
  • Features:
    • Real-time notifications
    • Spec management (list, view, annotate)
    • Requirements tab (PRD documents)
    • Session management
    • Extensions view (MCP/plugin configuration)
    • Collaborative spec review (annotation panel, author-grouped feedback)
    • Agent monitor / activity log

Cloud-Assisted Feature: Collaborative Spec Review

Specs are shareable at pilot-shell.com/s/<id> — teammates annotate in browser, feedback syncs back to Console via 60-second poll. No Pilot Shell install required for reviewers. Annotations land in .annotations/<spec>.json.

Observability

  • worker-service.cjs records observations on every tool use (PostToolUse async hook)
  • Stop hook runs async summarization
  • Console bell notifications for annotation events
  • No explicit audit log format documented for offline replay

Related frameworks

same archetype · same primary tool · same memory type

Claude-Flow / Ruflo ★ 55k

Eliminates single-agent context limits and sequential bottlenecks by orchestrating fault-tolerant swarms of specialized AI agents…

Hermes Agent (NousResearch) ★ 168k

Self-improving personal AI agent with closed learning loop, 7 terminal backends, and messaging gateway — not tied to any AI…

OpenCode ★ 165k

Terminal-first AI coding agent with multi-model routing, native desktop app, and a typed .opencode/ configuration system for…

OpenHands ★ 75k

Open-source AI software development platform (open-source Devin alternative) with Docker sandbox isolation, 77.6% SWE-bench…

DeerFlow ★ 70k

Long-horizon superagent that researches, codes, and creates by orchestrating parallel sub-agents with isolated contexts in Docker…

oh-my-openagent (omo) ★ 60k

Multi-provider AI agent orchestration for OpenCode: escape vendor lock-in by routing Sisyphus (Claude/Kimi/GLM) and Hephaestus…