Skip to content
/

TDD Guard

tdd-guard · nizos/tdd-guard · ★ 2.1k · last commit 2026-05-24

Primitive shape 4 total
Skills 1 Hooks 3
00

Summary

TDD Guard — Summary

TDD Guard is a Claude Code plugin (npm package + claude-plugin) that enforces Test-Driven Development principles in real time by intercepting Write, Edit, MultiEdit, and UserPromptSubmit events via PreToolUse hooks and invoking an AI validator before any file operation is permitted.

Problem it solves: AI coding agents routinely skip the red-green-refactor cycle — writing implementation before tests, over-implementing beyond a failing test, or refactoring with failing tests; TDD Guard uses a dedicated LLM call (Claude API) to analyze the proposed change against the current test results and block violations with a specific corrective message.

Distinctive trait: The enforcement mechanism is a PreToolUse hook that runs npx tdd-guard@latest before every write operation; the guard reads test output from .claude/tdd-guard/data/test.json (written by framework-specific reporters — Vitest, Jest, pytest, Go, Rust, etc.) and sends the change context plus TDD rules to a validation LLM, returning a block decision with rationale.

Target audience: Individual developers using Claude Code who want rigorous, automated TDD enforcement across multiple languages and test frameworks without embedding TDD rules into CLAUDE.md prompts.

Production-readiness: Actively maintained (v1.6.8), 2149 stars, MIT license, with community-contributed reporters for 9 test frameworks (Vitest, Jest, Storybook, pytest, PHPUnit, Go, Rust, RSpec, Minitest) and a related sibling project (Probity) for cross-tool coverage.

Differs from seeds: Most similar to superpowers (SessionStart hook + skill-based TDD enforcement), but TDD Guard intervenes before writes at the tool level via PreToolUse rather than relying on a skill activated at session start, and uses an external LLM validation call to make pass/fail decisions rather than prompt-level Iron Law instructions. Unlike spec-kit's optional TDD skill, TDD Guard makes the red-green-refactor cycle structurally enforced — violations are blocked, not advised.

01

Overview

TDD Guard — Overview

Origin

TDD Guard was created by Nizar Selander (nizos) and released on GitHub as an MIT-licensed Claude Code plugin. It emerged from the observation that Claude Code hooks provide a natural enforcement point for TDD discipline — the PreToolUse event fires synchronously before any file write, giving a validator the opportunity to block the operation if TDD rules are violated.

Philosophy

"TDD Guard ensures Claude Code follows Test-Driven Development principles. When your agent tries to skip tests or over-implement, TDD Guard blocks the action and explains what needs to happen instead."

The core philosophy treats TDD enforcement as an infrastructure concern rather than a prompt concern. Rather than asking the agent to "please follow TDD," the framework intercepts writes at the tool-use layer and runs an independent validation LLM that has no context about the agent's goals — only the proposed change and the current test state.

Key Principles

  1. Test-First Enforcement — No implementation may precede a failing test.
  2. Minimal Implementation — Code beyond what the current failing test requires is blocked.
  3. Lint Integration — Refactoring in the green phase can be enforced via ESLint rules.
  4. Red-Green-Refactor as Infrastructure — The three phases are mechanically enforced, not suggested.
  5. Session Control — The guard can be toggled on/off mid-session via /tdd-guard:setup.

Manifesto-Style Quotes (from source)

From rules.ts:

"The foundation of TDD is the Red-Green-Refactor cycle... Adding a single test to a test file is ALWAYS allowed — no prior test output needed."

"Before a failing test becomes a useful Red, it has to run far enough to evaluate an assertion. Some failures happen before that point... In both cases, the agent may adjust the impl: create missing stubs, change the signature to accept the test's call, or replace the body with a minimal form. This is part of reaching Red, not Refactoring."

From CLAUDE.md:

"TDD Guard is a Claude Code hook that enforces Test-Driven Development by intercepting file operations. This automated enforcement maintains code quality without cluttering prompts with TDD reminders."

The README explicitly recommends Probity for new projects, describing it as the evolution: "Enforces TDD and other policies across Claude Code, Codex, and Copilot. Works with any language and test runner. Uses session activity for validation, giving it a fuller picture when distinguishing refactors from new behavior."

02

Architecture

TDD Guard — Architecture

Distribution

  • Type: npm package + Claude Code plugin
  • Package: tdd-guard on npm (v1.6.8)
  • CLI binary: tdd-guard (bin entry dist/cli/tdd-guard.js)
  • Plugin manifest: plugin/.claude-plugin/plugin.json

Install Methods

# Via Claude Code plugin system
/plugin marketplace add nizos/tdd-guard
/plugin install tdd-guard@tdd-guard
/tdd-guard:setup

# CLI only
npm install -g tdd-guard

Required Runtime

  • Node.js 22+
  • A supported test framework (Vitest, Jest, Storybook, pytest, PHPUnit, Go, Rust, RSpec, Minitest)
  • Claude Code (for hooks)
  • Anthropic API key (for LLM validation calls)

Directory Structure

tdd-guard/
├── plugin/
│   ├── .claude-plugin/
│   │   └── plugin.json          # name=tdd-guard, version=1.3.0
│   ├── hooks/
│   │   └── hooks.json           # PreToolUse + UserPromptSubmit + SessionStart
│   └── skills/
│       └── setup/
│           └── SKILL.md         # /tdd-guard:setup skill
├── reporters/                   # Language-specific test reporters
│   ├── go/                      # tdd-guard-go
│   ├── jest/                    # tdd-guard-jest
│   ├── minitest/                # tdd-guard-minitest
│   ├── phpunit/                 # tdd-guard/phpunit
│   ├── pytest/                  # tdd-guard-pytest
│   ├── rspec/                   # tdd-guard-rspec
│   ├── rust/                    # tdd-guard-rust
│   ├── storybook/               # tdd-guard-storybook
│   └── vitest/                  # tdd-guard-vitest
└── src/
    ├── cli/                     # Hook entry point, context builder
    ├── config/                  # Configuration management
    ├── contracts/               # Types and Zod schemas
    ├── guard/                   # GuardManager (enable/disable)
    ├── hooks/                   # Claude Code hook parsing
    ├── linters/                 # ESLint integration
    ├── processors/              # Test result + lint processing
    ├── providers/               # Model + linter client factories
    ├── storage/                 # Storage abstractions (file/memory)
    └── validation/
        ├── validator.ts         # Sends context to AI model, parses response
        ├── context/             # Formats operations for AI validation
        ├── prompts/             # TDD rules + AI instructions
        │   ├── system-prompt.ts # "You are a TDD Guard..."
        │   ├── rules.ts         # Red/Green/Refactor rules
        │   └── operations/      # Per-operation prompts (write/edit/multi-edit/overwrite)
        └── models/              # Claude SDK and Anthropic API clients

State File

All reporters write test results to .claude/tdd-guard/data/test.json relative to project root.

Target AI Tools

Primary: Claude Code (via hooks). The CLI binary can be called from any hook-capable environment.

Hook Events

  • PreToolUse with matcher Write|Edit|MultiEdit|TodoWrite
  • UserPromptSubmit (no matcher — checks every prompt)
  • SessionStart with matcher startup|resume|clear
03

Components

TDD Guard — Components

Hooks (3 events)

Event Matcher Purpose
PreToolUse Write|Edit|MultiEdit|TodoWrite Intercept all file write operations and validate TDD compliance before allowing the write
UserPromptSubmit (none) Check TDD state at every prompt — catch violations before the agent plans a write
SessionStart startup|resume|clear Restore guard state and inject TDD context at session start/resume

All three events invoke npx tdd-guard@latest.

Skills (1)

Name Purpose
setup Detect test framework, install/update matching reporter, configure reporter with project root path. Invoked as /tdd-guard:setup. Allowed tools: Read, Glob, Grep only (no writes during setup).

CLI Binary

Binary Source Purpose
tdd-guard dist/cli/tdd-guard.js Main hook entry point; reads Claude Code hook input JSON, builds context, calls validation LLM, returns block/allow decision

Reporters (9 packages, separate from core)

Reporter Registry Framework
tdd-guard-vitest npm Vitest
tdd-guard-jest npm Jest
tdd-guard-storybook npm Storybook
tdd-guard-pytest PyPI pytest
tdd-guard/phpunit Packagist PHPUnit (v9 listener / v10 extension)
tdd-guard-go go install Go
tdd-guard-rust crates.io Rust/cargo
tdd-guard-rspec RubyGems RSpec
tdd-guard-minitest RubyGems Minitest

All reporters write to .claude/tdd-guard/data/test.json.

Validation Engine (src/validation/)

Component Purpose
validator.ts Orchestrates validation: reads context, calls LLM, parses allow/block response
system-prompt.ts System prompt defining the TDD Guard role
rules.ts Red/Green/Refactor rules, core violations, incremental development principles
operations/write.ts Context formatter for Write operations
operations/edit.ts Context formatter for Edit operations
operations/multi-edit.ts Context formatter for MultiEdit
models/ Claude SDK and Anthropic API client adapters

Storage

Component Purpose
FileStorage Reads/writes .claude/tdd-guard/data/test.json (test results)
MemoryStorage In-memory storage for test during validation
GuardManager Manages enable/disable state across sessions

Configuration

Settings via tdd-guard.config.json or environment variables; configurable: validation model, ignore patterns, custom rules, lint integration.

05

Prompts

TDD Guard — Prompts

Prompt 1: System Prompt (src/validation/prompts/system-prompt.ts)

Technique: Role-assignment + single-focus constraint (TDD compliance only, no code quality/style opinions)

export const SYSTEM_PROMPT = `# TDD-Guard

## Your Role
You are a Test-Driven Development (TDD) Guard - a specialized code reviewer who ensures developers follow the strict discipline required for true test-driven development.

Your purpose is to identify violations of TDD principles in real-time, helping agents maintain the Red-Green-Refactor cycle.

## What You're Reviewing
You are analyzing a code change to determine if it violates TDD principles. Focus only on TDD compliance, not code quality, style, or best practices.
`

Prompt 2: TDD Rules (src/validation/prompts/rules.ts)

Technique: Structured rule enumeration with numbered phases, exceptions, edge cases, and "helpful directions" directive to avoid blocking loops.

export const RULES = `## TDD Fundamentals

### The TDD Cycle
The foundation of TDD is the Red-Green-Refactor cycle:

1. **Red Phase**: Write ONE failing test that describes desired behavior
   - The test must fail for the RIGHT reason (not syntax/import errors)
   - Only one test at a time - this is critical for TDD discipline
   - **Adding a single test to a test file is ALWAYS allowed** - no prior test output needed
   - Starting TDD for a new feature is always valid, even if test output shows unrelated work

2. **Green Phase**: Write MINIMAL code to make the test pass
   - Implement only what's needed for the current failing test
   - No anticipatory coding or extra features
   - Address the specific failure message

3. **Refactor Phase**: Improve code structure while keeping tests green
   - Only allowed when relevant tests are passing
   - Requires proof that tests have been run and are green
   - Applies to BOTH implementation code and behavioral changes in test code (what assertions check)
   - No refactoring with failing tests - fix them first

### Core Violations

1. **Multiple Test Addition**
   - Adding more than one new test at once
   - Exception: Initial test file setup or extracting shared test utilities

2. **Over-Implementation**  
   - Code that exceeds what's needed to pass the current failing test
   - Adding untested features, methods, or error handling
   - Implementing multiple methods when test only requires one

3. **Premature Implementation**
   - Adding implementation before a test exists and fails properly
   - Adding implementation without running the test first
   - Behavioral refactoring when tests haven't been run or are failing

### Critical Principle: Incremental Development
Each step in TDD should address ONE specific issue:
- Test can't locate the impl (import/symbol unresolved) → Create empty stub only
- Test errors calling the impl (signature or call mismatch) → Adjust signature, stub body minimally
- Test fails on assertion (expected vs received) → Implement minimal logic only

...

### General Information
- In the refactor phase, it is perfectly fine to refactor both test and implementation code. That said, completely new functionality is not allowed.
- When a test-file diff restructures existing tests (new names, reordered, combined, split) and the intent isn't clearly "add many new tests," default to approval.
- During refactor (tests green), adding types, interfaces, or constant literals to an existing or new file is always allowed — they add no runtime behavior by construction.
- Provide the agent with helpful directions so that they do not get stuck when blocking them.

Prompt 3: Setup Skill (plugin/skills/setup/SKILL.md)

Technique: Instruction-set with constraint table (which reporters write to which paths), restricted toolset (allowed-tools: [Read, Glob, Grep]), explicit disable of model invocation (disable-model-invocation: true).

---
description: Set up or update TDD Guard for the current project. Detects the test framework, installs or updates the matching reporter, and configures or migrates its configuration to match the current specification.
disable-model-invocation: true
allowed-tools: [Read, Glob, Grep]
---

# TDD Guard Setup

Set up TDD Guard for the current project. Your goal is to:

1. Identify the test framework(s) used in this project
2. Install the matching TDD Guard reporter, or update it if already present
3. Configure the reporter, or migrate an existing configuration to match the current specification
09

Uniqueness

TDD Guard — Uniqueness

Differs from Seeds

TDD Guard is most similar to superpowers in that both enforce TDD practices for Claude Code. However, superpowers uses Iron Law skill prompts activated at SessionStart to instruct the agent to follow TDD, while TDD Guard uses PreToolUse hooks to intercept file writes and validate them with a separate LLM — the enforcement is mechanical, not persuasive. Unlike spec-kit, which has an optional TDD skill and uses PostToolUse hooks for test execution, TDD Guard's enforcement fires before the write occurs (not after), making it a true gate rather than a retrospective check. Unlike claude-flow's post-hook test runners, TDD Guard's validator understands TDD semantics (Red/Green/Refactor phases, what constitutes over-implementation) rather than simply running tests and checking pass/fail.

Key Differentiators

  1. Pre-write, not post-write enforcement — Blocks writes before they happen, unlike most test-running hooks.
  2. LLM-powered semantic validation — The validation model understands why something violates TDD, enabling nuanced rules (e.g., "refactoring types is allowed in green phase even without new tests").
  3. Framework-agnostic reporter system — 9 supported test frameworks with language-specific reporters all feeding a common JSON format.
  4. Separate validation LLM — The Claude Code session LLM and the TDD Guard validation LLM are distinct calls; the guard is not subject to the agent's own context manipulation.
  5. Sibling project exists — Probity extends TDD Guard's approach to Codex and Copilot.

Positioning

TDD Guard occupies the niche of "TDD compliance as infrastructure" — treating the red-green-refactor cycle as a constraint to be enforced at the tool layer, not a guideline to be remembered. It is the only framework in the corpus that makes TDD violations structurally impossible at write time (exit code 2 = hard block).

Observable Failure Modes

  • Stale test.json: If the agent writes code but doesn't run tests, test.json is outdated and the guard may incorrectly allow or block based on old state.
  • Reporter misconfiguration: If the reporter's projectRoot doesn't match, test.json may not be found and the guard may default to allow.
  • Validation LLM cost: Every write incurs an API call. In large sessions with many writes, this adds latency and cost.
  • Over-blocking on refactors: Complex refactors that touch many files may appear as over-implementation to the validator.
  • Bypass via TodoWrite: The TodoWrite matcher is included but todo-writing does not directly create code — the matcher may need tuning per project.

Explicit Antipatterns

  • Writing production code before a failing test exists
  • Adding more than one test at a time
  • Implementing features beyond what the current failing test requires
  • Refactoring with failing tests (green phase not reached)
  • Claiming refactor while actually adding new behavior
04

Workflow

TDD Guard — Workflow

Overview

TDD Guard enforces the Red-Green-Refactor cycle by intercepting every write before it happens. The workflow is reactive (not sequential like plan-based frameworks): every file write triggers a validation check.

Phase Flow

SessionStart → Guard restores state / injects context
     ↓
UserPromptSubmit → Guard checks current test state on every prompt
     ↓
Agent plans writes
     ↓
PreToolUse (Write/Edit/MultiEdit) → VALIDATION GATE
     │
     ├── Read test.json (reporter output)
     ├── Build context (proposed change + test state)
     ├── Send to validation LLM with TDD rules
     │
     ├── ALLOW → Write proceeds
     └── BLOCK → Exit code 2, corrective message shown to agent

Phase-to-Artifact Map

Phase Artifact
Reporter installation (/tdd-guard:setup) .claude/tdd-guard/data/test.json path configured in test framework config
Test execution (by agent or CI) .claude/tdd-guard/data/test.json — JSON with pass/fail, test names, error messages
Hook validation No persistent artifact; response is ephemeral stdout/stderr to Claude Code
Block decision Structured error message to agent with: what was violated, what TDD phase is required

Approval Gates

Gate Trigger Type
PreToolUse write validation Every Write/Edit/MultiEdit Automated (LLM) — blocks if violation
UserPromptSubmit Every prompt Automated (LLM) — warns if TDD state wrong
SessionStart guard restore Session start/resume Automated — no user interaction

TDD Phase Rules (enforced)

Red phase (write one failing test):

  • Only one new test addition at a time
  • Test must fail for the right reason (not import error)
  • Starting TDD for new feature is always valid

Green phase (minimal implementation):

  • Only code needed to pass the current failing test
  • No anticipatory coding, extra features, or multiple methods

Refactor phase (improve while green):

  • Only when tests are passing and have been run
  • Types, interfaces, constants, helper extraction allowed
  • No new behavior — behavior-free refactors only

Session Control

The agent (or user) can disable enforcement mid-session:

  • Toggle via /tdd-guard:setup (reconfigure or disable)
  • Guard state persists across SessionStart events
06

Memory Context

TDD Guard — Memory & Context

State Storage

Storage Path Purpose
Test results .claude/tdd-guard/data/test.json Written by reporters after each test run; read by the hook before validation
Guard state In-memory (session duration) Enabled/disabled state managed by GuardManager
Config tdd-guard.config.json (project root) Validation model, ignore patterns, lint integration settings

Persistence Model

  • Session-scoped: Guard enable/disable state lives in memory; restored at SessionStart
  • Project-scoped: test.json persists across sessions and is updated by the test reporter on every test run
  • No cross-session memory: The validation LLM receives only the current change context + latest test results — no history of prior sessions

Context Injection

At SessionStart, the guard reads existing test results and validates whether the agent is mid-TDD-cycle, enabling correct phase detection immediately on session resume.

At PreToolUse, context built for the LLM includes:

  1. The proposed file change (diff/content)
  2. The current test.json state (pass/fail, specific errors)
  3. The file path and operation type
  4. The TDD rules and phase determination logic

Compaction Handling

No explicit compaction handling. The framework operates at the tool-use layer (below context) so context compaction does not affect enforcement. Test results live in a file, not in context.

Cross-Session Handoff

Implicit: the test reporter continues to update test.json between sessions. When a new session starts, SessionStart reads the current test state and can determine what TDD phase the project is in.

07

Orchestration

TDD Guard — Orchestration

Multi-Agent

No — TDD Guard is a single-agent enforcement layer. It does not spawn subagents or coordinate multiple agents.

Orchestration Pattern

None. TDD Guard is reactive: it intercepts individual tool-use events from a single Claude Code session.

Execution Mode

Event-driven: the guard only runs when Claude Code fires PreToolUse, UserPromptSubmit, or SessionStart events.

Multi-Model Usage

Yes — in a limited sense. TDD Guard calls an external Anthropic API endpoint (separate from the Claude Code session) to perform TDD validation. The validation model is configurable (docs/validation-model.md). This is a secondary LLM call, not a different model for a different "role."

Isolation Mechanism

None at the framework level. TDD Guard edits in-place; it does not create branches or worktrees. It prevents writes from happening, but does not sandbox them.

Consensus Mechanism

None.

Prompt Chaining

Yes — in the hook flow: the PreToolUse hook captures the proposed change, the reporter output (from prior test run) is read, and together they form the input prompt for the validation LLM. The validation LLM's output (allow/block + rationale) is the guard's response to Claude Code.

Crash Recovery

No explicit crash recovery. If the guard process fails (non-zero exit for unexpected reasons), Claude Code treats it as a block by default (fail-closed behavior per the hook specification).

Streaming Output

No — the guard responds synchronously with a single JSON/text response to the hook.

08

Ui Cli Surface

TDD Guard — UI / CLI Surface

CLI Binary

Binary Invocation Purpose
tdd-guard npx tdd-guard@latest Main entry point for Claude Code hooks; reads hook input from stdin, runs validation, exits 0 (allow) or 2 (block)

The CLI is thin but not a wrapper over the Claude CLI. It has its own TypeScript runtime that reads the hook event JSON, constructs TDD validation context, calls the Anthropic API, and emits the block/allow response.

Subcommands / Modes

Exposed via the setup skill (/tdd-guard:setup) rather than CLI flags. The setup flow:

  1. Detects test framework from project files
  2. Installs/updates appropriate reporter package
  3. Configures reporter with projectRoot path
  4. Writes or migrates existing config

No Local Web Dashboard

TDD Guard has no web UI, TUI, or dashboard. All interaction is through:

  • Claude Code's hook feedback (block messages shown in-context)
  • The /tdd-guard:setup skill
  • Standard npm config files

IDE Integration

Indirect — operates through Claude Code's hook system, which is IDE-agnostic. Works wherever Claude Code runs.

Observability

  • Block decisions include structured rationale (what phase is required, what violation was detected)
  • No persistent audit log of decisions
  • Test results in .claude/tdd-guard/data/test.json are the only persistent state

Configuration Docs

  • docs/configuration.md — all settings
  • docs/validation-model.md — choose validation model (faster vs more capable)
  • docs/enforcement.md — strengthen enforcement against bypass attempts
  • docs/linting.md — ESLint integration for refactor phase
  • docs/ignore-patterns.md — control which files are validated
  • docs/custom-instructions.md — customize TDD validation rules

Related frameworks

same archetype · same primary tool · same memory type

OpenHarness ★ 13k

Open-source Python agent runtime providing complete harness infrastructure: tools, memory, governance, swarm coordination, and…

Trae Agent ★ 12k

Research-friendly open-source CLI coding agent by ByteDance, designed for academic ablation studies and modular LLM provider…

Sweep AI ★ 7.7k

Autonomous GitHub bot that converts issues to pull requests using a sequential multi-agent pipeline.

Agent Governance Toolkit (microsoft) ★ 2.3k

Enterprise-grade AI agent governance: YAML policy enforcement, 12-vector prompt injection defense, zero-trust identity,…

Agentic Coding Flywheel Setup (ACFS) ★ 1.5k

Take a complete beginner from laptop to three AI coding agents running on a VPS in 30 minutes via an idempotent manifest-driven…

leash (strongdm) ★ 565

Wraps AI coding agents in containers with eBPF-enforced Cedar policies, making policy violations (unauthorized file access,…