Skip to content
/

sourcebook

sourcebook · maroondlabs/sourcebook · ★ 3 · last commit 2026-04-16

Generates AI context files from codebase git history and import graph, capturing hidden co-change couplings and hub file blast radius that agents consistently miss.

Best whenAuto-generated obvious context hurts agent performance (ETH Zurich). Context files should contain only non-discoverable information, placed at the beginning …
Skip ifIncluding discoverable patterns in CLAUDE.md (standard test frameworks, obvious directories), Placing critical constraints in the middle of context files
vs seeds
spec-kit's PostToolUse hooks but checks change completeness rather than test passage. The research-backed CLAUDE.md generation (…
Primitive shape 5 total
Hooks 1 MCP tools 4
00

Summary

sourcebook — Summary

sourcebook is an npm CLI tool and MCP server that generates AI context files (CLAUDE.md, AGENTS.md, Cursor rules) from a codebase's actual git history and import graph — capturing conventions, constraints, and hidden couplings that agents consistently miss. It operates in two layers: Layer A (rules-based, <1 second, zero cost) mines git co-change history and static import graphs to identify test file mappings, sibling modules, and hub files with high fan-in; Layer B (AI-powered, ~$0.012/run) uses Claude Sonnet to catch semantic cross-module dependencies invisible to static analysis. The sourcebook check command flags files that should have been changed alongside a given diff — a completeness gate for AI agent edits. With 3 stars, BSL-1.1 license (converts to MIT in 2030), and research-backed accuracy claims (0% false positives on clean diffs), it is the most rigorous "context quality" tool in the corpus.

Compared to seeds: no seed generates CLAUDE.md from codebase analysis. The closest seed is agent-os (which ships conventions as pre-written markdown) but agent-os's content is generic templates; sourcebook generates project-specific content from actual git history. The sourcebook check completeness gate is architecturally similar to spec-kit's PostToolUse hooks that run tests after edits, but applied to change completeness rather than test passage.

01

Overview

sourcebook — Overview

Origin

Created by maroondlabs (GitHub). May 2026 active development, v0.14.0. Published on npm as sourcebook. BSL-1.1 licensed (source-available, converts to MIT on 2030-03-25). Available on the official MCP registry.

Philosophy

From README:

"Catches the files your AI agent forgot to change."

"A safety layer for code changes. sourcebook analyzes git diffs for completeness — flags files that should've been modified but weren't. Rules-based structural detection plus AI-powered semantic analysis. Zero false positives on clean diffs."

Research foundation:

Design implication: sourcebook generates ONLY non-discoverable information. Standard patterns (test frameworks, obvious directories) are stripped unless --verbose is passed.

Generator Output Philosophy

From source code comment:

/**
 * Design principles (from research):
 * 1. ONLY non-discoverable information (ETH Zurich: auto-generated obvious context hurts by 2-3%)
 * 2. Context-rot-aware formatting (Chroma Research: 30%+ accuracy drop for info in the middle)
 *    → Critical info at BEGINNING and END of file
 * 3. Karpathy's program.md pattern: constraints, gotchas, and autonomy boundaries
 */

Key Stats

Metric Result
Completeness gate accuracy 100% (30/30 diffs)
False positive rate 0% on clean diffs
Test file detection 73%
Sibling detection 71%
AI analysis cost ~$0.012/run
02

Architecture

sourcebook — Architecture

Distribution

  • Type: npm-package + MCP server
  • Binary: sourcebook
  • Version: 0.14.0
  • Install: npx sourcebook init (zero global install required)

Runtime Requirements

  • Node.js
  • Git (for co-change analysis)
  • ANTHROPIC_API_KEY (optional, for --ai flag)

Layer Architecture

Layer A — Rules-based (no LLM, <1 second)
├── Co-change analysis (git history mining)
│   └── Files that change together >= threshold times
├── Test file detection (naming conventions + co-change)
├── Import graph (dependency graph via static analysis)
│   └── PageRank for hub file importance ranking
└── Hub detection (files with 50+ dependents)

Layer B — AI-powered (~$0.012/run)
└── Claude Sonnet semantic analysis
    ├── Cross-module semantic relationships
    ├── Field renames requiring migrations
    └── Stale validation logic
    ✓ Requires dependency citation — no hallucinated paths
    ✓ Completeness gate: silent if diff is actually clean

Source Structure

src/
├── cli.ts             # Entry point
├── commands/
│   ├── check.ts       # Completeness analysis
│   ├── init.ts        # Claude Code hooks + CLAUDE.md
│   ├── ask.ts         # NL query
│   ├── serve.ts       # MCP server
│   ├── truth.ts       # Repo Truth Map
│   ├── watch.ts       # Auto-regenerate on changes
│   └── ...
├── scanner/
│   ├── index.ts       # Two-pass scanner (structural first, git second)
│   ├── graph.ts       # Import graph + PageRank
│   └── git.ts         # Co-change analysis
└── generators/
    ├── claude.ts      # CLAUDE.md generator
    ├── agents.ts      # AGENTS.md generator
    ├── cursor.ts      # Cursor rules generator
    └── truth.ts       # 2.5D visualization

Language Support

Language Import Graph Git Analysis
TypeScript/JavaScript Full Full
Python Full Full
Go Full Full
Rust Full Partial
03

Components

sourcebook — Components

CLI Binary: sourcebook

Command Description
sourcebook check Analyze current diff for completeness
sourcebook check --ai Add AI semantic analysis (requires ANTHROPIC_API_KEY)
sourcebook check --quiet Exit code only: 1 if findings, 0 if clean
sourcebook check --json Structured JSON output
sourcebook check --branch main Compare HEAD against a branch
sourcebook check --threshold 0.9 Custom co-change coupling threshold (0-1)
sourcebook init Set up Claude Code hooks + generate CLAUDE.md
sourcebook scan-history Retrospective scan of recent commits
sourcebook hooks Install or check Claude Code hooks
sourcebook truth Generate a Repo Truth Map (2.5D visualization)
sourcebook serve Start MCP server
sourcebook update Re-analyze while preserving manual edits
sourcebook diff Show what would change (exit code 1 if changes)
sourcebook watch Auto-regenerate context files on source changes
sourcebook ask <query> Query codebase knowledge in NL

MCP Server

  • sourcebook serve starts the MCP server
  • Published on official MCP registry
  • Tools exposed:
    • get_blast_radius — what breaks if you edit a hub file
    • query_conventions — confirm the right pattern before adding code
    • get_pressing_questions — briefing before editing an unfamiliar file
    • get_co_change_pairs — files that historically change together

Generators (4)

Generator Output
claude.ts CLAUDE.md
agents.ts AGENTS.md
cursor.ts Cursor rules
truth.ts Repo Truth Map (2.5D visualization)

Claude Code Hooks (installed by sourcebook init)

  • PostToolUse hook that triggers completeness check after file edits
  • Writes findings to context before the agent proceeds
05

Prompts

sourcebook — Prompts

Excerpt 1: Generated CLAUDE.md (from sourcebook's own CLAUDE.md)

## Critical Constraints

- **Publishable library:** This is a CLI tool published to npm. Avoid breaking changes 
  to exported types and function signatures in `src/types.ts` and the public CLI surface.
- **Hidden dependencies:** Files that change together across directories (invisible 
  coupling): `clone.js` ↔ `webhook.js` (3 co-commits); `license.ts` ↔ `activate.ts` 
  (4 co-commits); `index.ts` ↔ `types.ts` (5 co-commits). When modifying any of these, 
  check its co-change partner for required updates.
- **Core modules:** Hub files (most depended on): `src/types.ts` (imported by 19 files); 
  `src/scanner/index.ts` (imported by 10 files). Changes here have the widest blast radius.

## Architecture

The scanner runs in two passes: structural (import graph via PageRank) first, then git 
history. PageRank results inform pattern sampling — don't merge the passes. The tool is 
stateless: no caching layer, no framework, no ORM.

## Deprecated Patterns

- **Pro gate** — removed in v0.13.0. Don't reach for license-gated feature checks.
- **Verbose-by-default** — the old behavior before the ETH Zurich research finding.

Technique: Research-backed constraint writing. "Hidden dependencies" are machine-discovered from git history, not human-authored. "Critical Constraints" at top (LLMs retain beginning best), "Deprecated Patterns" at bottom.


Excerpt 2: Claude generator comment (design principles)

From src/generators/claude.ts:

/**
 * Design principles (from research):
 * 1. ONLY non-discoverable information 
 *    (ETH Zurich: auto-generated obvious context hurts by 2-3%)
 * 2. Context-rot-aware formatting
 *    (Chroma Research: 30%+ accuracy drop for info in the middle)
 *    → Critical info at BEGINNING and END of file
 * 3. Karpathy's program.md pattern: constraints, gotchas, and autonomy boundaries
 */

Technique: The generator itself is documented with the research findings that justify each design decision. This is "prompts about prompts" — the generator's own source code is a meta-prompt for future maintainers.

09

Uniqueness

sourcebook — Uniqueness

differs_from_seeds

No seed generates AI context files from codebase analysis. Every seed uses human-authored templates (agent-os, claude-conductor), framework-specific schema files (openspec, spec-kit), or no context injection at all (ccmemory). sourcebook inverts the pattern: it mines the codebase's actual git history and import graph to discover project-specific conventions and constraints, then generates a CLAUDE.md encoding those discoveries. The closest seed is agent-os (which ships CLAUDE.md-style templates) but agent-os content is generic; sourcebook content is project-specific and machine-generated. The completeness gate (sourcebook check) is architecturally similar to spec-kit's PostToolUse hooks (which run tests after edits), but checks change completeness (missing sibling files) rather than code correctness. The research-backed CLAUDE.md generation design — placing critical constraints at beginning and end, stripping discoverable context — is unique in the corpus.

Positioning

A "meta-tool" for AI agents — it doesn't guide how agents write code, it ensures agents have accurate project context before writing. Where other frameworks tell agents how to work (process, workflow, phases), sourcebook tells agents what to know (hidden couplings, hub files, deprecated patterns). The tool also actively guards against over-contextualization (the ETH Zurich finding that verbose auto-generated context hurts performance).

Observable Failure Modes

  1. Context staleness: CLAUDE.md is a snapshot of the codebase at generation time. In fast-moving projects, it can go stale within days. sourcebook watch mitigates but adds process overhead.
  2. False negative risk: Layer A catches 73% of test file needs and 71% of siblings — the 27-29% miss rate means agents may miss some updates even with sourcebook running.
  3. AI layer cost accumulation: at $0.012/run, frequent usage in CI across a large team can add up. The --quiet flag without --ai avoids this but loses semantic analysis.
  4. BSL-1.1 license: cannot be offered as a hosted service. Teams wanting a hosted version must wait until 2030 or use the planned GitHub App (waitlist).
  5. No spec layer: sourcebook describes existing code conventions but doesn't help plan new features. It's a context quality tool, not a planning framework.
04

Workflow

sourcebook — Workflow

Two Usage Modes

Mode 1: Context Generation

npx sourcebook init      # generates CLAUDE.md + installs hooks

After init, every sourcebook check or hook trigger regenerates the CLAUDE.md with:

  • Hidden co-change couplings
  • Hub files and their blast radius
  • Fragile files (high churn rate)
  • Architecture notes (two-pass scanner design)
  • Deprecated patterns
  • Active development areas

Mode 2: Completeness Gate

sourcebook check              # Check staged/unstaged changes
sourcebook check --ai         # + AI semantic analysis
sourcebook check --quiet      # Exit code 1 if findings (for CI)

Hook-Integrated Workflow

After sourcebook init, Claude Code hooks trigger sourcebook check automatically when agents edit files:

Agent edits src/handler.ts
→ PostToolUse hook fires
→ sourcebook check runs
→ Finds: "test/handler.test.ts likely needs updating (co-change: 94%)"
→ Agent sees warning before proceeding

Approval Gates

Gate Type
sourcebook check exit code 1 (CI usage) typed-confirm
Human review of generated CLAUDE.md file-review

Generated CLAUDE.md Structure

  1. Header (always)
  2. Commands (build/test/lint)
  3. Critical Constraints — at top (LLMs retain start best)
  4. Hidden co-change couplings
  5. Hub files + blast radius
  6. Architecture notes
  7. Deprecated patterns — at bottom (secondary retention)

AI Layer (--ai flag)

Sends diff + dependency context to Claude Sonnet. Only activates if Layer A found findings. Zero cost on clean diffs.

06

Memory Context

sourcebook — Memory & Context

State Storage

sourcebook is fundamentally stateless between runs. It does not maintain a database or persistent session state. Each run re-analyzes the codebase from scratch.

Artifacts it generates (which other tools then use):

CLAUDE.md            # Generated context for Claude Code
AGENTS.md            # Generated context for multi-agent tools
.cursor/rules/       # Generated Cursor rules

What It Mines (not stored)

  • Git history: co-change pairs, churn rates, recent activity
  • Import graph: PageRank hub files, dependency chains
  • File naming conventions: test file mappings

Regeneration

sourcebook update   # Re-analyze + regenerate, preserving manual edits
sourcebook watch    # Auto-regenerate on source changes

The update command is additive — human-edited sections in CLAUDE.md are preserved while machine-generated sections are refreshed.

Memory Type

File-based. The generated CLAUDE.md IS the memory — it encodes the codebase's hidden knowledge into a format the AI can consume.

No Cross-Session State

sourcebook does not track what AI agents have done. It tracks what the codebase looks like. The CLAUDE.md file is the only persistent artifact.

Context Budget Enforcement

The claude.ts generator enforces a token budget:

  • Critical findings at beginning and end (never middle)
  • Non-verbose mode strips discoverable patterns
  • Section ordering by importance (not by file structure)
07

Orchestration

sourcebook — Orchestration

Multi-Agent

No. sourcebook is a single-process CLI/MCP server.

Orchestration Pattern

None. sourcebook is a support tool for other agents, not an orchestrator.

Isolation Mechanism

None — reads from git history and file system, no modifications to project files except generated context files.

Multi-Model

Effectively yes — Layer A uses zero LLM calls; Layer B uses Claude Sonnet (configurable via ANTHROPIC_API_KEY). Different analysis types route to different processing:

  • Structural analysis (import graph, git history) → deterministic algorithms
  • Semantic analysis (cross-module relationships) → Claude Sonnet
  • Context generation → Claude Sonnet

Hook Integration

When installed via sourcebook init, a Claude Code PostToolUse hook fires after every file edit:

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit",
      "command": "sourcebook check --quiet"
    }]
  }
}

This makes sourcebook a passive observer on the agent's editing workflow, not an active orchestrator.

Execution Mode

One-shot per invocation. The MCP server (sourcebook serve) is persistent — clients can query codebase knowledge without re-running the scanner.

Crash Recovery

N/A — stateless. Re-run produces the same output.

08

Ui Cli Surface

sourcebook — UI & CLI Surface

Dedicated CLI Binary

Yes.

  • Binary: sourcebook
  • Package: sourcebook on npm
  • Version: 0.14.0

Subcommands (14)

Command Description
check Analyze diff for completeness
check --ai + AI semantic analysis
check --quiet Exit code only (CI mode)
check --json Structured JSON output
check --branch <b> Compare vs branch
check --threshold <n> Custom coupling threshold
init Set up hooks + generate CLAUDE.md
scan-history Retrospective scan
hooks Install/check hooks
truth Generate Repo Truth Map (2.5D viz)
serve Start MCP server
update Re-analyze + preserve manual edits
diff Show what would change
watch Auto-regenerate on source changes
ask <query> NL query of codebase knowledge

MCP Server

Published on the official MCP registry.

Configuration:

{
  "mcpServers": {
    "sourcebook": {
      "command": "npx",
      "args": ["-y", "sourcebook", "serve", "--dir", "/path/to/project"]
    }
  }
}

MCP tools: get_blast_radius, query_conventions, get_pressing_questions, get_co_change_pairs

No Local Web UI

No web dashboard. The sourcebook truth command generates a 2.5D visualization as a static file, but there's no live server.

CI/CD Integration

# GitHub Actions
- run: npx sourcebook check --quiet
# Exit 1 if incomplete changes found

Planned: GitHub App

Coming soon — automated completeness checks on every PR. Join the waitlist.

Related frameworks

same archetype · same primary tool · same memory type

Context-Engineering Handbook ★ 9.0k

Provides a first-principles, research-grounded vocabulary and learning path for context engineering — the discipline of designing…

walkinglabs/learn-harness-engineering ★ 6.6k

Teach harness engineering from first principles (12 lectures + 6 projects) and provide a scaffolding skill (harness-creator) that…

Awesome Harness Engineering (walkinglabs) ★ 2.7k

Curate the authoritative reference list of articles, benchmarks, and tools for harness engineering — the practice of shaping the…

cline-memory-bank (nickbaumann98) ★ 581

Custom instructions + 6-file hierarchical Markdown memory bank so Cline maintains full project context across sessions, with a…

FPF (First Principles Framework) ★ 372

Provides a formal pattern language for making reasoning explicit, traceable, and publishable in mixed human/AI engineering work —…

nexu-io/harness-engineering-guide ★ 134

Provide a practical, code-first reference guide to harness engineering — from first principles to production patterns —…