sourcebook

sourcebook · maroondlabs/sourcebook · ★ 3 · last commit 2026-04-16

Generates AI context files from codebase git history and import graph, capturing hidden co-change couplings and hub file blast radius that agents consistently miss.

Best whenAuto-generated obvious context hurts agent performance (ETH Zurich). Context files should contain only non-discoverable information, placed at the beginning …

Skip ifIncluding discoverable patterns in CLAUDE.md (standard test frameworks, obvious directories), Placing critical constraints in the middle of context files

vs seeds

spec-kit's PostToolUse hooks but checks change completeness rather than test passage. The research-backed CLAUDE.md generation (…

Primitive shape 5 total

Hooks 1 MCP tools 4

Summary

sourcebook — Summary

sourcebook is an npm CLI tool and MCP server that generates AI context files (CLAUDE.md, AGENTS.md, Cursor rules) from a codebase's actual git history and import graph — capturing conventions, constraints, and hidden couplings that agents consistently miss. It operates in two layers: Layer A (rules-based, <1 second, zero cost) mines git co-change history and static import graphs to identify test file mappings, sibling modules, and hub files with high fan-in; Layer B (AI-powered, ~$0.012/run) uses Claude Sonnet to catch semantic cross-module dependencies invisible to static analysis. The sourcebook check command flags files that should have been changed alongside a given diff — a completeness gate for AI agent edits. With 3 stars, BSL-1.1 license (converts to MIT in 2030), and research-backed accuracy claims (0% false positives on clean diffs), it is the most rigorous "context quality" tool in the corpus.

Compared to seeds: no seed generates CLAUDE.md from codebase analysis. The closest seed is agent-os (which ships conventions as pre-written markdown) but agent-os's content is generic templates; sourcebook generates project-specific content from actual git history. The sourcebook check completeness gate is architecturally similar to spec-kit's PostToolUse hooks that run tests after edits, but applied to change completeness rather than test passage.

Overview

sourcebook — Overview

Origin

Created by maroondlabs (GitHub). May 2026 active development, v0.14.0. Published on npm as sourcebook. BSL-1.1 licensed (source-available, converts to MIT on 2030-03-25). Available on the official MCP registry.

Philosophy

From README:

"Catches the files your AI agent forgot to change."

"A safety layer for code changes. sourcebook analyzes git diffs for completeness — flags files that should've been modified but weren't. Rules-based structural detection plus AI-powered semantic analysis. Zero false positives on clean diffs."

Research foundation:

ETH Zurich finding: "auto-generated obvious context hurts agent performance by 2-3%"
Chroma Research: "30%+ accuracy drop for info in the middle of context"

Design implication: sourcebook generates ONLY non-discoverable information. Standard patterns (test frameworks, obvious directories) are stripped unless --verbose is passed.

Generator Output Philosophy

From source code comment:

/**
 * Design principles (from research):
 * 1. ONLY non-discoverable information (ETH Zurich: auto-generated obvious context hurts by 2-3%)
 * 2. Context-rot-aware formatting (Chroma Research: 30%+ accuracy drop for info in the middle)
 *    → Critical info at BEGINNING and END of file
 * 3. Karpathy's program.md pattern: constraints, gotchas, and autonomy boundaries
 */

Key Stats

Metric	Result
Completeness gate accuracy	100% (30/30 diffs)
False positive rate	0% on clean diffs
Test file detection	73%
Sibling detection	71%
AI analysis cost	~$0.012/run

Architecture

sourcebook — Architecture

Distribution

Type: npm-package + MCP server
Binary: sourcebook
Version: 0.14.0
Install: npx sourcebook init (zero global install required)

Runtime Requirements

Node.js
Git (for co-change analysis)
ANTHROPIC_API_KEY (optional, for --ai flag)

Layer Architecture

Layer A — Rules-based (no LLM, <1 second)
├── Co-change analysis (git history mining)
│   └── Files that change together >= threshold times
├── Test file detection (naming conventions + co-change)
├── Import graph (dependency graph via static analysis)
│   └── PageRank for hub file importance ranking
└── Hub detection (files with 50+ dependents)

Layer B — AI-powered (~$0.012/run)
└── Claude Sonnet semantic analysis
    ├── Cross-module semantic relationships
    ├── Field renames requiring migrations
    └── Stale validation logic
    ✓ Requires dependency citation — no hallucinated paths
    ✓ Completeness gate: silent if diff is actually clean

Source Structure

src/
├── cli.ts             # Entry point
├── commands/
│   ├── check.ts       # Completeness analysis
│   ├── init.ts        # Claude Code hooks + CLAUDE.md
│   ├── ask.ts         # NL query
│   ├── serve.ts       # MCP server
│   ├── truth.ts       # Repo Truth Map
│   ├── watch.ts       # Auto-regenerate on changes
│   └── ...
├── scanner/
│   ├── index.ts       # Two-pass scanner (structural first, git second)
│   ├── graph.ts       # Import graph + PageRank
│   └── git.ts         # Co-change analysis
└── generators/
    ├── claude.ts      # CLAUDE.md generator
    ├── agents.ts      # AGENTS.md generator
    ├── cursor.ts      # Cursor rules generator
    └── truth.ts       # 2.5D visualization

Language Support

Language	Import Graph	Git Analysis
TypeScript/JavaScript	Full	Full
Python	Full	Full
Go	Full	Full
Rust	Full	Partial

Components

sourcebook — Components

CLI Binary: `sourcebook`

Command	Description
`sourcebook check`	Analyze current diff for completeness
`sourcebook check --ai`	Add AI semantic analysis (requires ANTHROPIC_API_KEY)
`sourcebook check --quiet`	Exit code only: 1 if findings, 0 if clean
`sourcebook check --json`	Structured JSON output
`sourcebook check --branch main`	Compare HEAD against a branch
`sourcebook check --threshold 0.9`	Custom co-change coupling threshold (0-1)
`sourcebook init`	Set up Claude Code hooks + generate CLAUDE.md
`sourcebook scan-history`	Retrospective scan of recent commits
`sourcebook hooks`	Install or check Claude Code hooks
`sourcebook truth`	Generate a Repo Truth Map (2.5D visualization)
`sourcebook serve`	Start MCP server
`sourcebook update`	Re-analyze while preserving manual edits
`sourcebook diff`	Show what would change (exit code 1 if changes)
`sourcebook watch`	Auto-regenerate context files on source changes
`sourcebook ask <query>`	Query codebase knowledge in NL

MCP Server

sourcebook serve starts the MCP server
Published on official MCP registry
Tools exposed:
- get_blast_radius — what breaks if you edit a hub file
- query_conventions — confirm the right pattern before adding code
- get_pressing_questions — briefing before editing an unfamiliar file
- get_co_change_pairs — files that historically change together

Generators (4)

Generator	Output
`claude.ts`	CLAUDE.md
`agents.ts`	AGENTS.md
`cursor.ts`	Cursor rules
`truth.ts`	Repo Truth Map (2.5D visualization)

Claude Code Hooks (installed by `sourcebook init`)

PostToolUse hook that triggers completeness check after file edits
Writes findings to context before the agent proceeds

Prompts

sourcebook — Prompts

Excerpt 1: Generated CLAUDE.md (from sourcebook's own CLAUDE.md)

## Critical Constraints

- **Publishable library:** This is a CLI tool published to npm. Avoid breaking changes 
  to exported types and function signatures in `src/types.ts` and the public CLI surface.
- **Hidden dependencies:** Files that change together across directories (invisible 
  coupling): `clone.js` ↔ `webhook.js` (3 co-commits); `license.ts` ↔ `activate.ts` 
  (4 co-commits); `index.ts` ↔ `types.ts` (5 co-commits). When modifying any of these, 
  check its co-change partner for required updates.
- **Core modules:** Hub files (most depended on): `src/types.ts` (imported by 19 files); 
  `src/scanner/index.ts` (imported by 10 files). Changes here have the widest blast radius.

## Architecture

The scanner runs in two passes: structural (import graph via PageRank) first, then git 
history. PageRank results inform pattern sampling — don't merge the passes. The tool is 
stateless: no caching layer, no framework, no ORM.

## Deprecated Patterns

- **Pro gate** — removed in v0.13.0. Don't reach for license-gated feature checks.
- **Verbose-by-default** — the old behavior before the ETH Zurich research finding.

Technique: Research-backed constraint writing. "Hidden dependencies" are machine-discovered from git history, not human-authored. "Critical Constraints" at top (LLMs retain beginning best), "Deprecated Patterns" at bottom.

Excerpt 2: Claude generator comment (design principles)

From src/generators/claude.ts:

/**
 * Design principles (from research):
 * 1. ONLY non-discoverable information 
 *    (ETH Zurich: auto-generated obvious context hurts by 2-3%)
 * 2. Context-rot-aware formatting
 *    (Chroma Research: 30%+ accuracy drop for info in the middle)
 *    → Critical info at BEGINNING and END of file
 * 3. Karpathy's program.md pattern: constraints, gotchas, and autonomy boundaries
 */

Technique: The generator itself is documented with the research findings that justify each design decision. This is "prompts about prompts" — the generator's own source code is a meta-prompt for future maintainers.

Uniqueness

sourcebook — Uniqueness

differs_from_seeds

No seed generates AI context files from codebase analysis. Every seed uses human-authored templates (agent-os, claude-conductor), framework-specific schema files (openspec, spec-kit), or no context injection at all (ccmemory). sourcebook inverts the pattern: it mines the codebase's actual git history and import graph to discover project-specific conventions and constraints, then generates a CLAUDE.md encoding those discoveries. The closest seed is agent-os (which ships CLAUDE.md-style templates) but agent-os content is generic; sourcebook content is project-specific and machine-generated. The completeness gate (sourcebook check) is architecturally similar to spec-kit's PostToolUse hooks (which run tests after edits), but checks change completeness (missing sibling files) rather than code correctness. The research-backed CLAUDE.md generation design — placing critical constraints at beginning and end, stripping discoverable context — is unique in the corpus.

Positioning

A "meta-tool" for AI agents — it doesn't guide how agents write code, it ensures agents have accurate project context before writing. Where other frameworks tell agents how to work (process, workflow, phases), sourcebook tells agents what to know (hidden couplings, hub files, deprecated patterns). The tool also actively guards against over-contextualization (the ETH Zurich finding that verbose auto-generated context hurts performance).

Observable Failure Modes

Context staleness: CLAUDE.md is a snapshot of the codebase at generation time. In fast-moving projects, it can go stale within days. sourcebook watch mitigates but adds process overhead.
False negative risk: Layer A catches 73% of test file needs and 71% of siblings — the 27-29% miss rate means agents may miss some updates even with sourcebook running.
AI layer cost accumulation: at $0.012/run, frequent usage in CI across a large team can add up. The --quiet flag without --ai avoids this but loses semantic analysis.
BSL-1.1 license: cannot be offered as a hosted service. Teams wanting a hosted version must wait until 2030 or use the planned GitHub App (waitlist).
No spec layer: sourcebook describes existing code conventions but doesn't help plan new features. It's a context quality tool, not a planning framework.

Workflow

sourcebook — Workflow

Two Usage Modes

Mode 1: Context Generation

npx sourcebook init      # generates CLAUDE.md + installs hooks

After init, every sourcebook check or hook trigger regenerates the CLAUDE.md with:

Hidden co-change couplings
Hub files and their blast radius
Fragile files (high churn rate)
Architecture notes (two-pass scanner design)
Deprecated patterns
Active development areas

Mode 2: Completeness Gate

sourcebook check              # Check staged/unstaged changes
sourcebook check --ai         # + AI semantic analysis
sourcebook check --quiet      # Exit code 1 if findings (for CI)

Hook-Integrated Workflow

After sourcebook init, Claude Code hooks trigger sourcebook check automatically when agents edit files:

Agent edits src/handler.ts
→ PostToolUse hook fires
→ sourcebook check runs
→ Finds: "test/handler.test.ts likely needs updating (co-change: 94%)"
→ Agent sees warning before proceeding

Approval Gates

Gate	Type
`sourcebook check` exit code 1 (CI usage)	typed-confirm
Human review of generated CLAUDE.md	file-review

Generated CLAUDE.md Structure

Header (always)
Commands (build/test/lint)
Critical Constraints — at top (LLMs retain start best)
Hidden co-change couplings
Hub files + blast radius
Architecture notes
Deprecated patterns — at bottom (secondary retention)

AI Layer (--ai flag)

Sends diff + dependency context to Claude Sonnet. Only activates if Layer A found findings. Zero cost on clean diffs.

Memory Context

sourcebook — Memory & Context

State Storage

sourcebook is fundamentally stateless between runs. It does not maintain a database or persistent session state. Each run re-analyzes the codebase from scratch.

Artifacts it generates (which other tools then use):

CLAUDE.md            # Generated context for Claude Code
AGENTS.md            # Generated context for multi-agent tools
.cursor/rules/       # Generated Cursor rules

What It Mines (not stored)

Git history: co-change pairs, churn rates, recent activity
Import graph: PageRank hub files, dependency chains
File naming conventions: test file mappings

Regeneration

sourcebook update   # Re-analyze + regenerate, preserving manual edits
sourcebook watch    # Auto-regenerate on source changes

The update command is additive — human-edited sections in CLAUDE.md are preserved while machine-generated sections are refreshed.

Memory Type

File-based. The generated CLAUDE.md IS the memory — it encodes the codebase's hidden knowledge into a format the AI can consume.

No Cross-Session State

sourcebook does not track what AI agents have done. It tracks what the codebase looks like. The CLAUDE.md file is the only persistent artifact.

Context Budget Enforcement

The claude.ts generator enforces a token budget:

Critical findings at beginning and end (never middle)
Non-verbose mode strips discoverable patterns
Section ordering by importance (not by file structure)

Orchestration

sourcebook — Orchestration

Multi-Agent

No. sourcebook is a single-process CLI/MCP server.

Orchestration Pattern

None. sourcebook is a support tool for other agents, not an orchestrator.

Isolation Mechanism

None — reads from git history and file system, no modifications to project files except generated context files.

Multi-Model

Effectively yes — Layer A uses zero LLM calls; Layer B uses Claude Sonnet (configurable via ANTHROPIC_API_KEY). Different analysis types route to different processing:

Structural analysis (import graph, git history) → deterministic algorithms
Semantic analysis (cross-module relationships) → Claude Sonnet
Context generation → Claude Sonnet

Hook Integration

When installed via sourcebook init, a Claude Code PostToolUse hook fires after every file edit:

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit",
      "command": "sourcebook check --quiet"
    }]
  }
}

This makes sourcebook a passive observer on the agent's editing workflow, not an active orchestrator.

Execution Mode

One-shot per invocation. The MCP server (sourcebook serve) is persistent — clients can query codebase knowledge without re-running the scanner.

Crash Recovery

N/A — stateless. Re-run produces the same output.

Ui Cli Surface

sourcebook — UI & CLI Surface

Dedicated CLI Binary

Yes.

Binary: sourcebook
Package: sourcebook on npm
Version: 0.14.0

Subcommands (14)

Command	Description
`check`	Analyze diff for completeness
`check --ai`	+ AI semantic analysis
`check --quiet`	Exit code only (CI mode)
`check --json`	Structured JSON output
`check --branch <b>`	Compare vs branch
`check --threshold <n>`	Custom coupling threshold
`init`	Set up hooks + generate CLAUDE.md
`scan-history`	Retrospective scan
`hooks`	Install/check hooks
`truth`	Generate Repo Truth Map (2.5D viz)
`serve`	Start MCP server
`update`	Re-analyze + preserve manual edits
`diff`	Show what would change
`watch`	Auto-regenerate on source changes
`ask <query>`	NL query of codebase knowledge

MCP Server

Published on the official MCP registry.

Configuration:

{
  "mcpServers": {
    "sourcebook": {
      "command": "npx",
      "args": ["-y", "sourcebook", "serve", "--dir", "/path/to/project"]
    }
  }
}

MCP tools: get_blast_radius, query_conventions, get_pressing_questions, get_co_change_pairs

No Local Web UI

No web dashboard. The sourcebook truth command generates a 2.5D visualization as a static file, but there's no live server.

CI/CD Integration

# GitHub Actions
- run: npx sourcebook check --quiet
# Exit 1 if incomplete changes found

Planned: GitHub App

Coming soon — automated completeness checks on every PR. Join the waitlist.

Related frameworks

same archetype · same primary tool · same memory type

Context-Engineering Handbook ★ 9.0k

A13 Methodology

Provides a first-principles, research-grounded vocabulary and learning path for context engineering — the discipline of designing…

walkinglabs/learn-harness-engineering ★ 6.6k

A13 Methodology

Teach harness engineering from first principles (12 lectures + 6 projects) and provide a scaffolding skill (harness-creator) that…

Awesome Harness Engineering (walkinglabs) ★ 2.7k

A13 Methodology

Curate the authoritative reference list of articles, benchmarks, and tools for harness engineering — the practice of shaping the…

cline-memory-bank (nickbaumann98) ★ 581

A13 Methodology

Custom instructions + 6-file hierarchical Markdown memory bank so Cline maintains full project context across sessions, with a…

FPF (First Principles Framework) ★ 372

A13 Methodology

Provides a formal pattern language for making reasoning explicit, traceable, and publishable in mixed human/AI engineering work —…

nexu-io/harness-engineering-guide ★ 134

A13 Methodology

Provide a practical, code-first reference guide to harness engineering — from first principles to production patterns —…

Distribution

Type: npm-package
License: NOASSERTION
Install: one-liner
Version: 0.14.0

Surfaces

CLI binary: sourcebook
CLI subcmds: 14
Local UI: No

Components

Commands: 0
Skills: 0
Subagents: 0
Hooks: 1
MCP servers: 1
MCP tools: 4
Scripts: 0
Templates: 0

Workflow

Phases: 4
Approval gates: 0
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: No
Pattern: none
Max concurrent: 1
Isolation: none
Consensus: none
Prompt chaining: No

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text

Execution

Mode: one-shot
Crash recovery: No
Compaction: No
Session handoff: No
Streaming: No

Memory

Type: file-based
Persistence: project
Search: full-text
State files: 2 files

Quality

TDD: No
TDD mechanism: none
Validators: 2
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: No
Audit format: none
Replay: No

Tools

Primary: claude-code
Targets: 4
Portability: high

Signals

Stars: 3
Last commit: 2026-04-16
Contributors: 1
Maintainer: active
Quality score: 1.4/10

Summary

sourcebook — Summary

Overview

sourcebook — Overview

Origin

Philosophy

Generator Output Philosophy

Key Stats

Architecture

sourcebook — Architecture

Distribution

Runtime Requirements

Layer Architecture

Source Structure

Language Support

Components

sourcebook — Components

CLI Binary: sourcebook

MCP Server

Generators (4)

Claude Code Hooks (installed by sourcebook init)

Prompts

sourcebook — Prompts

Excerpt 1: Generated CLAUDE.md (from sourcebook's own CLAUDE.md)

Excerpt 2: Claude generator comment (design principles)

Uniqueness

sourcebook — Uniqueness

differs_from_seeds

Positioning

Observable Failure Modes

Workflow

sourcebook — Workflow

Two Usage Modes

Mode 1: Context Generation

Mode 2: Completeness Gate

Hook-Integrated Workflow

Approval Gates

Generated CLAUDE.md Structure

AI Layer (--ai flag)

Memory Context

sourcebook — Memory & Context

State Storage

What It Mines (not stored)

Regeneration

Memory Type

No Cross-Session State

Context Budget Enforcement

Orchestration

sourcebook — Orchestration

Multi-Agent

Orchestration Pattern

Isolation Mechanism

Multi-Model

Hook Integration

Execution Mode

Crash Recovery

Ui Cli Surface

sourcebook — UI & CLI Surface

Dedicated CLI Binary

Subcommands (14)

MCP Server

No Local Web UI

CI/CD Integration

Planned: GitHub App

Related frameworks

CLI Binary: `sourcebook`

Claude Code Hooks (installed by `sourcebook init`)