Skip to content
/

Forge (LucasDuys)

forge-lucasduys · LucasDuys/forge · ★ 29 · last commit 2026-05-02

Replace the developer as the state machine by running tasks in isolated git worktrees with TDD, automatic backpropagation of runtime failures into specs, and a stop-hook loop that drives everything.

Best whenA runtime failure is not a bug to fix — it is a spec gap to encode as a new acceptance criterion.
Skip ifSkipping the brainstorm interactive Q&A, Writing spec files directly without user approval
vs seeds
spec-driver(both implement the Ralph stop-hook loop with structured spec artifacts and task-by-task execution), but Forge uniquely …
Primitive shape 35 total
Commands 13 Skills 8 Subagents 8 Hooks 6
00

Summary

Forge (LucasDuys) — Summary

Forge is a native Claude Code plugin that transforms a one-line idea into reviewed, tested, and committed code through a five-phase autonomous loop: brainstorm → plan → execute → review+verify → backprop. Its most distinctive feature is git worktree isolation per task: each task runs in .forge/worktrees/{task-id}/ and squash-merges atomically on success, so a failing task cannot corrupt the branch. The backprop phase is unique in the corpus — when a runtime failure exposes a spec gap, that gap automatically becomes a new acceptance criterion plus regression test, closing the loop on specification quality. State lives entirely on disk in .forge/ with per-task checkpoints, making it crash-recoverable. It ships as a Claude Code plugin with 13 slash commands, 8 named agent files, and 6 PostToolUse/Stop hooks that drive progress tracking, token monitoring, tool caching, and automatic backpropagation detection. An optional Ably integration enables real-time cross-machine coordination in multiplayer mode.

Compared to seeds, Forge is most similar to spec-driver (both implement the Ralph loop with structured spec artifacts and task-by-task execution) but uniquely differentiates via: (1) per-task git worktree isolation absent from all seeds; (2) a backpropagation phase that feeds runtime failures back into the spec as acceptance criteria; (3) a token budget enforcement mechanism with budget_exhausted fallback; and (4) an R-number verification system (R001–RN each verified at four levels: existence, substantive, wired, runtime).

01

Overview

Forge (LucasDuys) — Overview

Origin

Created by LucasDuys (Lucas Duys), published on GitHub under MIT license. Current version: 0.3.0 (745 passing tests according to README badge). Active development, last push 2026-05-02, 3 contributors.

Philosophy

From the README:

"The problem: You start a feature in Claude Code. You write the prompt. It writes the code. You review it. You re-prompt. It tries again. It loses context. You re-explain. You watch the 'context: 87%' warning crawl up. You restart. You re-explain again. Three hours in, half a feature done, and you are the one keeping the whole thing from falling apart. You are the project manager. You are the state machine. You are the glue. Forge replaces you as the glue."

The core philosophy is: specifications should be verifiable contracts (R-numbered requirements with four verification levels), and when the code fails to satisfy a requirement at runtime, the failure should feed back into the spec automatically (backprop).

Key Manifesto Excerpts

From the README:

"You describe what you want in one line. Forge writes the spec, plans the tasks, runs them in parallel git worktrees with TDD, reviews the code, verifies it against the acceptance criteria, and commits atomically. You read the diffs in the morning."

From execute.md (v2.1 changes):

"Token budgets are now enforced as hard ceilings, not warnings. Exhaustion transitions to budget_exhausted phase with a handoff doc at .forge/resume.md." "Git worktree isolation per task: each task runs in .forge/worktrees/{task-id}/ and is squash-merged on success." "Lock file acquisition at session start: refuses to run if another valid lock is held (unless stale >5min)."

From execute.md (backprop):

"When a runtime failure exposes a spec gap, the gap becomes a new acceptance criterion + regression test, and the loop resumes."

Target Users

Developers who want full autonomy without losing reviewability — "Walk away. This is what you actually see while it runs." The README shows a live terminal output format where users can see task-level progress without interruption.

02

Architecture

Forge (LucasDuys) — Architecture

Distribution

  • Type: Claude Code plugin
  • Install: claude plugin marketplace add LucasDuys/forge then claude plugin install forge@forge-marketplace
  • License: MIT
  • Language: JavaScript (scripts/hooks), Markdown (commands/agents/skills)
  • Version: 0.3.0

Directory Structure

.claude-plugin/         ← Plugin manifest (marketplace.json, plugin.json)
commands/               ← 13 slash commands
  brainstorm.md, plan.md, execute.md, review-branch.md, backprop.md,
  help.md, resume.md, status.md, collaborate.md, update.md, watch.md,
  skills-audit.md, setup-tools.md
  forge-complexity.md, forge-executor.md, forge-planner.md,
  forge-researcher.md, forge-reviewer.md, forge-speccer-validator.md,
  forge-speccer.md, forge-verifier.md, forge-visual-verifier.md
agents/                 ← 8 named agents (inline in commands/ directory)
hooks/
  hooks.json            ← 6 hooks (Stop, PreToolUse, PostToolUse×4)
  stop-hook.sh          ← Autonomous loop driver
  auto-backprop.js      ← Detects runtime failures → triggers backprop
  progress-tracker.js   ← Task progress tracking
  token-monitor.sh      ← Token budget enforcement
  tool-cache.js         ← Pre-read caching
  tool-cache-store.js   ← Cache writes
  output-filter.js      ← PostToolUse output filtering
  test-output-filter.js ← Test result filtering
  brainstorming/        ← Brainstorming skill
  planning/             ← Planning skill
  executing/            ← Execution skill
  reviewing/            ← Review skill
  backpropagation/      ← Backprop skill
  collaborating/        ← Ably multiplayer skill
  design-system/        ← Design system skill
  graphify-integration/ ← Knowledge graph integration
  karpathy-guardrails/  ← Karpathy-style safety checks
  caveman-internal/     ← Internal debug skill
scripts/
  setup.sh              ← Project initialization
  forge-tools.cjs       ← CLI binary (forge-tools)
  forge-wizard.cjs      ← First-run wizard
  forge-status-block.cjs ← Status dashboard renderer

.forge/                 ← Per-project state (created at runtime)
  specs/                ← R-numbered spec files (spec-*.md)
  plans/                ← Task frontiers (*-frontier.md)
  worktrees/            ← Per-task git worktrees (task execution isolation)
  capabilities.json     ← Discovered MCP servers, skills, plugins
  config.json           ← Forge configuration
  resume.md             ← Budget exhaustion handoff document

Required Runtime

  • Claude Code v1.0.33+
  • Node.js (for scripts/forge-tools.cjs)
  • Git (for worktree operations)
  • Optional: Ably account for multiplayer mode

Target AI Tools

  • Claude Code (primary, only supported tool currently)
03

Components

Forge (LucasDuys) — Components

Commands (13)

All commands prefixed /forge:.

Command Purpose
brainstorm Interactive Q&A → R-numbered spec with testable acceptance criteria
plan Spec → dependency-ordered task DAG with token estimates
execute Autonomous implementation loop with worktree isolation, TDD, stop-hook
review-branch Code review pass against spec R-numbers
backprop Feed runtime failure back into spec as new acceptance criterion
status Show current phase, task progress, token budget
resume Resume from last per-task checkpoint
watch Full-screen ANSI status dashboard in separate terminal
collaborate Enable Ably real-time cross-machine coordination
update Update Forge plugin
help Command reference
skills-audit Audit available Claude skills for compatibility
setup-tools Initialize project tools and capabilities

Agents (8, defined as skill files in hooks/)

Agent/Skill Purpose
forge:brainstorming Interactive spec creation (min 3 clarifying questions, 2-3 approach proposals)
forge:planning Task DAG creation from approved specs
forge:executing Task implementation with TDD enforcement
forge:reviewing Code review against R-numbers
forge:backpropagation Failure analysis → spec amendment
forge:collaborating Ably multiplayer coordination
forge:design-system Design system reference integration
forge:graphify-integration Knowledge graph (codebase graph.json) integration

Hooks (6)

Hook Event Matcher Script Purpose
Stop * stop-hook.sh Autonomous loop: drives execute iterations
PreToolUse Bash|Grep|Glob|Read tool-cache.js Pre-read cache lookup (avoid redundant reads)
PostToolUse * token-monitor.sh Token budget monitoring and enforcement
PostToolUse Bash test-output-filter.js Filter test output for relevant failures
PostToolUse Bash output-filter.js General output filtering
PostToolUse Bash auto-backprop.js Detect runtime failures → trigger backprop phase
PostToolUse Bash|Grep|Glob|Read tool-cache-store.js Write to tool cache
PostToolUse * progress-tracker.js Track per-task step progress

Scripts (CLI)

  • scripts/forge-tools.cjs — CLI binary: node scripts/forge-tools.cjs headless execute --help
  • scripts/forge-wizard.cjs — One-time first-run wizard (token reduction guidance)
  • scripts/forge-status-block.cjs — Status dashboard renderer (called by stop-hook each iteration)

State Files

  • .forge/specs/spec-*.md — R-numbered specs with status: approved gate
  • .forge/plans/*-frontier.md — Task frontier files
  • .forge/worktrees/{task-id}/ — Per-task isolated git worktrees
  • .forge/capabilities.json — Discovered tools (MCP, skills, plugins)
  • .forge/config.json — Configuration (execute.status_header, token budgets, etc.)
  • .forge/resume.md — Budget exhaustion handoff document
05

Prompts

Forge (LucasDuys) — Prompt Excerpts

Excerpt 1: brainstorm.md — Mandatory Q&A + Approach Proposal Pattern

Source: commands/brainstorm.md

## Step 3: Start brainstorming

**IMPORTANT: The brainstorm phase is mandatory before planning and execution.** The spec files
produced by this workflow are the ONLY way to get `status: approved` specs, which are required
by `/forge plan` and `/forge execute`. Do NOT skip this step. Do NOT write spec files directly
without going through the full interactive brainstorm flow.

Now invoke the `forge:brainstorming` skill with the user's arguments, plus any auto-detected
context (design system path, graph summary).

**Workflow enforcement:** The brainstorming skill MUST:
1. Ask clarifying questions (minimum 3, even for simple topics)
2. Present 2-3 approach proposals with trade-offs

Prompting technique: Gate enforcement via metadata (status: approved required by downstream commands). This makes the approval gate verifiable by code rather than by conversational contract.


Excerpt 2: execute.md — Pre-flight Gate Enforcement

Source: commands/execute.md

## Pre-flight Check

**The Forge workflow is strictly sequential: brainstorm -> plan -> execute.** These pre-flight
checks enforce that the prior phases completed correctly.

3. **Verify ALL spec files have `status: approved` in their YAML frontmatter.** Read each
   `spec-*.md` file and parse its frontmatter. If ANY spec does not have `status: approved`,
   stop and tell the user:
   > Unapproved specs found: {list}. The brainstorm workflow must complete with explicit user
   > approval before execution. Run `/forge brainstorm` and approve an approach.

   This is the critical gate that prevents skipping the interactive brainstorm Q&A. A spec only
   gets `status: approved` when the brainstorming skill writes it after the user explicitly
   approves an approach.

Prompting technique: Machine-readable approval gate — the status: approved frontmatter field is parsed programmatically, not checked conversationally. Unusual: the gate is enforced by the receiving command, not by the producing command.


Excerpt 3: execute.md — v2.1 Feature Block (Worktree Isolation)

Source: commands/execute.md

## New in v2.1

- **Token budgets** are now enforced as hard ceilings, not warnings. Exhaustion transitions to
  `budget_exhausted` phase with a handoff doc at `.forge/resume.md`.
- **Git worktree isolation** per task: each task runs in `.forge/worktrees/{task-id}/` and is
  squash-merged on success.
- **Lock file acquisition** at session start: refuses to run if another valid lock is held
  (unless stale >5min).
- **Per-task checkpoints** at each step: resume picks up exactly where the last checkpoint
  was written.
- **Headless mode** for CI: see `node scripts/forge-tools.cjs headless execute --help`

Prompting technique: Changelog-in-prompt — the "New in v2.1" block inside the command prompt ensures Claude knows which behaviors are current, preventing it from describing deprecated behavior.


Excerpt 4: execute.md — Live Status Header

Source: commands/execute.md

**Live status header.** Every iteration of `/forge:execute` is automatically prefixed with a
compact dashboard-style status block (phase, current task + step, agent, progress bar, tokens,
per-task budget, lock status). You see what forge is doing without leaving Claude Code or running
a separate command. The block is generated by `scripts/forge-status-block.cjs`, called from the
Stop hook on every iteration. Opt out by setting `execute.status_header: false` in
`.forge/config.json`.

Prompting technique: Observability contract defined in the prompt — the status block is mandatory behavior described in the command, with an explicit opt-out path.

09

Uniqueness

Forge (LucasDuys) — Uniqueness

Differs From Seeds

Forge is most similar to spec-driver (both implement the Ralph loop with structured spec artifacts and task-by-task execution with a stop-hook loop) but adds three capabilities absent from all 11 seeds: (1) per-task git worktree isolation — each task executes in .forge/worktrees/{task-id}/ and squash-merges atomically, making it the only framework in this batch that completely isolates task execution at the filesystem level; (2) automatic backpropagation — the auto-backprop.js PostToolUse hook detects runtime failures and feeds them back into the spec as new R-numbered acceptance criteria plus regression tests, closing the spec→code→spec loop; and (3) a token budget hard ceiling with graceful budget_exhausted fallback state. The R-number verification system (four levels: existence, substantive, wired, runtime) is also more rigorous than any seed's verification approach.

Positioning

  • "One idea in. Tested, reviewed, committed code out."
  • For teams willing to invest in the brainstorm interactive Q&A in exchange for fully autonomous execution
  • More opinionated than Smart Ralph (hard sequential gates, status: approved contract)
  • Less broad tool support than GSD (Claude Code only vs GSD's 9+ tools)

Notable Patterns

  1. status: approved frontmatter gate: The brainstorm output is only usable by execute if its YAML frontmatter contains status: approved — a machine-verifiable approval contract
  2. Changelog-in-prompt: "New in v2.1" block inside execute.md ensures Claude doesn't describe deprecated behavior
  3. Karpathy guardrails skill: Named after Andrej Karpathy's code quality philosophy — the only framework in the corpus with a named-after-a-researcher guardrails component
  4. gsd-nyquist-auditor parallel: Like GSD's Nyquist auditor, Forge uses named-pattern components from other disciplines

Observable Failure Modes

  1. Worktree accumulation: Failed tasks leave orphaned worktrees in .forge/worktrees/; no automatic cleanup on failure
  2. Brainstorm gate friction: The mandatory 3+ question interactive Q&A makes quick one-off tasks feel heavyweight
  3. Claude Code only: No cross-tool portability
  4. Auto-backprop false positives: Heuristic failure detection may trigger backprop for non-spec-gap failures (e.g., environment issues, transient errors)

Cross-References

  • References Karpathy guardrails and graphify-integration (the same knowledge graph pattern used in GSD's gsd-graphify-update.sh)
  • The design-system skill integrates with DESIGN.md conventions, suggesting awareness of design-system-first workflows
04

Workflow

Forge (LucasDuys) — Workflow

Phases

brainstorm → plan → execute → [review + verify] → [backprop → execute again]
Phase Command Agent Artifact Approval Gate
Brainstorm /forge:brainstorm [idea] forge:brainstorming .forge/specs/spec-*.md with status: approved Yes — user explicitly approves approach
Plan /forge:plan forge:planning .forge/plans/*-frontier.md No (automatic after specs approved)
Execute `/forge:execute [--autonomy full gated supervised]` forge:executing
Review /forge:review-branch forge:reviewing review notes Yes — human reviews
Verify (part of execute phase) forge:verifier R-number verification at 4 levels Implicit
Backprop /forge:backprop (or auto-triggered) forge:backpropagation amended spec + regression test Automatic

Brainstorm Phase

  1. First-run wizard runs (once per install)
  2. Project capabilities are discovered (MCP servers, skills, plugins → capabilities.json)
  3. Design system and knowledge graph auto-detected
  4. Agent asks minimum 3 clarifying questions, presents 2-3 approach proposals with tradeoffs
  5. User approves one approach
  6. Spec written with R-numbered requirements (R001, R002, ...)
  7. Spec gets status: approved in YAML frontmatter — required gate for execution

Execute Phase — Autonomy Levels

  • --autonomy full: no per-task gates, loop runs until ALL_TASKS_COMPLETE
  • --autonomy gated: pauses after each task for review
  • --autonomy supervised: pauses at each step

Execute Phase — Pre-flight Checks

Before execution, the command verifies:

  1. .forge/ exists
  2. Spec files exist and ALL have status: approved
  3. Frontier files exist for each spec
  4. No Ralph Loop conflict (.claude/ralph-loop.local.md)
  5. No stale lock file held by another session

Verification (R-Number System)

Each requirement R001–RN is verified at four levels:

  1. Existence: the feature code exists
  2. Substantive: the code actually implements the requirement
  3. Wired: the feature is connected to the rest of the system
  4. Runtime: the feature works end-to-end at runtime

Backprop Phase

When a runtime verification fails:

  1. auto-backprop.js (PostToolUse hook) detects the failure pattern
  2. Or user manually runs /forge:backprop
  3. The agent identifies the spec gap
  4. Gap becomes a new acceptance criterion in the spec
  5. A regression test is written
  6. Loop resumes from the failing task

Token Budget Enforcement

  • token-monitor.sh tracks per-task token usage
  • Hard ceiling: when exhausted, transitions to budget_exhausted phase
  • Writes .forge/resume.md with state for handoff to next session
06

Memory Context

Forge (LucasDuys) — Memory & Context

State Storage

All state is file-based in .forge/ at the project root.

Spec Files (.forge/specs/spec-*.md)

  • YAML frontmatter includes status: draft | approved
  • R-numbered requirements (R001, R002, ...) with acceptance criteria
  • Source-of-truth for what must be verified

Frontier Files (.forge/plans/*-frontier.md)

  • Task DAGs derived from approved specs
  • Each task has: ID, description, dependencies, token estimate, status
  • Status values: pending | in_progress | tests_written | tests_passing | complete

Worktrees (.forge/worktrees/{task-id}/)

  • Each task gets an isolated git worktree
  • Task commits are squash-merged on success
  • Worktrees cleaned up after merge

Configuration (.forge/config.json)

  • execute.status_header: enable/disable status block
  • Token budget settings per task type
  • Autonomy level defaults

Resume Handoff (.forge/resume.md)

  • Written when token budget is exhausted
  • Contains current phase, task index, remaining tasks
  • Used by /forge:resume to pick up where the last session ended

Capabilities (.forge/capabilities.json)

  • Written by forge-tools.cjs discover during brainstorm initialization
  • Lists available MCP servers, Claude skills, and plugins

Persistence

  • Scope: project-level, git-tracked
  • Per-task checkpoints written at every step (not just task completion)
  • State survives token budget exhaustion via .forge/resume.md

Context Compaction

  • No built-in compaction
  • The tool cache (tool-cache.js / tool-cache-store.js hooks) deduplicates repeated file reads within a session to reduce token usage
  • Token monitor enforces hard ceilings, transitioning to budget_exhausted state when exceeded

Cross-Session Handoff

  • /forge:resume reads .forge/resume.md and restarts from the last checkpoint
  • Per-task checkpoints mean resume is granular (step-level, not just task-level)
  • Lock file mechanism prevents concurrent session conflicts (5-minute stale timeout)
07

Orchestration

Forge (LucasDuys) — Orchestration

Multi-Agent Pattern

Yes — task-decomposition-tree with isolated worktree execution per task.

  • Orchestration pattern: task-decomposition-tree (spec → frontier → task-level workers)
  • Coordinator: execute command + stop-hook loop
  • Workers: forge:executing agent instances, one per task, each in its own git worktree

Worktree Isolation

Each task runs in .forge/worktrees/{task-id}/ — a separate git worktree cloned from the main branch. On task success, the worktree is squash-merged atomically back to the main branch. This is the strongest isolation mechanism in this batch; the only seed-adjacent pattern is in vibe-kanban.

Stop-Hook Autonomous Loop

The Stop hook fires stop-hook.sh after every iteration. The script:

  1. Renders the status block via forge-status-block.cjs
  2. Reads the frontier to determine next pending task
  3. If --autonomy full: re-invokes execute for the next task
  4. If budget exhausted: writes .forge/resume.md and halts

Backprop Mechanism

The auto-backprop.js PostToolUse hook monitors Bash outputs:

  • Detects runtime failure patterns
  • If a failure matches a spec gap heuristic: triggers the backprop phase
  • Backprop phase: gap → new R-number → regression test → loop resumes

This creates a closed-loop feedback from runtime to spec — no other framework in this batch does this automatically.

Multiplayer (Optional)

The collaborate command + forge:collaborating skill enable real-time cross-machine coordination via Ably:

  • Sub-second coordination between multiple Claude Code instances
  • Documented in docs/collaborate.md

Execution Mode

  • --autonomy full: continuous-ralph — stop-hook drives loop without interruption
  • --autonomy gated: interactive-loop — pause after each task
  • --autonomy supervised: interactive-loop — pause after each step

Multi-Model

No multi-model routing. The planning phase labels tasks with model hints ([haiku, quick] vs [sonnet, standard]) visible in the README terminal output, but this is informational metadata — there is no programmatic routing to different models from the framework.

Consensus

None. The forge:reviewing agent reviews code, but it's a single-pass review, not consensus.

Prompt Chaining

Yes — brainstorm produces spec → plan produces frontier → execute consumes frontier tasks → verify checks R-numbers → backprop amends spec → loop.

08

Ui Cli Surface

Forge (LucasDuys) — UI / CLI Surface

CLI Binary

Yes — forge-tools (scripts/forge-tools.cjs):

  • node scripts/forge-tools.cjs discover --forge-dir .forge — Discovers capabilities
  • node scripts/forge-tools.cjs headless execute --help — Headless CI mode
  • Called internally by brainstorm and execute commands

Also:

  • forge-wizard.cjs — One-time first-run token reduction wizard
  • forge-status-block.cjs — Status block renderer (called by stop-hook each iteration)

Local UI

  • No web dashboard
  • /forge:watch command: a full-screen ANSI dashboard in a separate terminal window (not a web UI, but the richest in-terminal observability in this batch)
  • Status block: every execute iteration prints a compact dashboard with phase, current task, step, progress bar, tokens, per-task budget, lock status

IDE Integration

Claude Code only. Installed as a native Claude Code plugin.

Observability

  • Live status block: rendered every iteration by the stop-hook (opt-out via config)
  • /forge:watch: separate terminal full-screen ANSI dashboard
  • /forge:status: current phase, task progress, token budget
  • Per-task checkpoints: written at every implementation step
  • .forge/resume.md: budget exhaustion handoff document

Headless / CI Mode

node scripts/forge-tools.cjs headless execute --help

Non-interactive execution with exit codes, suitable for CI pipelines.

Multiplayer Observability

When Ably is connected (/forge:collaborate):

  • Real-time task status visible across machines
  • Sub-second coordination for distributed teams

Related frameworks

same archetype · same primary tool · same memory type

OpenHarness ★ 13k

Open-source Python agent runtime providing complete harness infrastructure: tools, memory, governance, swarm coordination, and…

Trae Agent ★ 12k

Research-friendly open-source CLI coding agent by ByteDance, designed for academic ablation studies and modular LLM provider…

Sweep AI ★ 7.7k

Autonomous GitHub bot that converts issues to pull requests using a sequential multi-agent pipeline.

Agent Governance Toolkit (microsoft) ★ 2.3k

Enterprise-grade AI agent governance: YAML policy enforcement, 12-vector prompt injection defense, zero-trust identity,…

TDD Guard ★ 2.1k

Mechanically enforces the Red-Green-Refactor TDD cycle by blocking file writes that violate TDD principles via a PreToolUse hook…

Agentic Coding Flywheel Setup (ACFS) ★ 1.5k

Take a complete beginner from laptop to three AI coding agents running on a VPS in 30 minutes via an idempotent manifest-driven…