Skip to content
/

Spec Kit

spec-kit · github/spec-kit · ★ 106k · last commit 2026-05-22

Turns a natural-language feature description into a complete, versioned, AI-executable specification pipeline installable for 30+ agents in one CLI command.

Best whenSpecifications should be the source of truth from which code is continuously regenerated — not documentation that trails implementation.
Skip ifVibe coding (prompting without structured specs), Specifications as disposable scaffolding
Primitive shape 37 total
Commands 9 Skills 9 Hooks 18 MCP tools 1
00

Summary

Spec Kit — Summary

Spec Kit is GitHub's official open-source toolkit for Spec-Driven Development (SDD), a methodology in which structured specifications — not prompts — are the primary artifact that drives AI-assisted code generation. It solves the chronic gap between intent and implementation by treating feature specs, technical plans, and task lists as first-class, versioned documents that the AI executes rather than guesses at. What makes it distinct is its 9-command slash-command pipeline (/speckit.constitution/speckit.specify/speckit.clarify/speckit.plan/speckit.tasks/speckit.analyze/speckit.implement/speckit.checklist/speckit.taskstoissues), its first-party Specify CLI (uv tool install specify-cli) that bootstraps any project for 30+ AI agents simultaneously, and its layered extension/preset system that lets teams override any template without forking the core. It is aimed at individual developers and engineering teams who want reproducible, auditable AI-assisted development across greenfield, brownfield, and parallel-exploration scenarios. With 105,966 stars, 30 contributors, MIT license, and an active release cadence (last push May 2026), it is firmly production-ready and the most-starred spec-driven-development framework currently in existence.

01

Overview

Spec Kit — Overview

Tagline

"Build high-quality software faster."

An open source toolkit that allows you to focus on product scenarios and predictable outcomes instead of vibe coding every piece from scratch.

Origin and Problem Framing

Spec Kit was created and open-sourced by GitHub, shipped under the github organization. It emerged from the observation that most AI-assisted development today is "vibe coding" — ad-hoc prompting that produces inconsistent, unreviewable, hard-to-maintain results. The project's canonical methodology document (spec-driven.md) frames this as a power inversion problem:

"For decades, code has been king — specifications were just scaffolding we built and discarded once the 'real work' of coding began. Spec-Driven Development changes this: specifications become executable, directly generating working implementations rather than just guiding them."

The authors argue that the gap between specification and implementation has plagued software since its inception and that previous attempts (better docs, stricter processes) fail because they accept the gap rather than eliminating it. SDD eliminates the gap by making the specification the source of truth from which code is continuously regenerated.

Philosophy

The README and spec-driven.md articulate five core principles:

  1. Specifications as the Lingua Franca — The specification is the primary artifact; code is its expression in a particular language and framework. Maintaining software means evolving specifications, not just patching code.
  2. Executable Specifications — Specs must be precise enough to generate working systems, eliminating ambiguity that derails generation.
  3. Continuous Refinement — Consistency validation is not a one-time gate but an ongoing AI-assisted process across spec/plan/tasks artifacts.
  4. Research-Driven Context — Research agents gather technical context (library compatibility, performance benchmarks, organizational constraints) throughout planning.
  5. Bidirectional Feedback — Production metrics and incidents feed back into specifications for the next regeneration cycle.

The authors explicitly frame this as "intent-driven development": the lingua franca moves to natural language and design assets; code is the "last-mile approach." They also position SDD as a team process — specs are versioned, created in branches, reviewed, and merged just like code.

What-If and Exploration

A notable philosophical position is support for parallel implementation exploration: generate multiple implementations from the same specification to explore different optimization targets (performance, cost, UX) without rewriting. The question "If we need to re-implement to sell more T-shirts, how would we experiment?" is treated as a routine workflow question, not an exceptional event.

Why Now

The README's "Why SDD Matters Now" section identifies three trends: AI capabilities have crossed a threshold for reliable spec-to-code generation; software complexity now exceeds what manual alignment can manage; and the pace of change has made pivoting an expectation rather than an exception. SDD is positioned as the structural response to all three.

02

Architecture

Spec Kit — Architecture

Distribution Type

cli-tool — distributed as a Python package (specify-cli) installed via uv or pipx. The CLI bootstraps project directories with agent-specific command/skill files copied from the package's bundled templates.

Install Methods

# Recommended (uv)
uv tool install specify-cli --from git+https://github.com/github/spec-kit.git@vX.Y.Z

# Alternative (pipx)
pipx install specify-cli --spec git+https://github.com/github/spec-kit.git@vX.Y.Z

Initialize a project:

specify init my-project --integration copilot
specify init . --integration claude
specify init . --integration codex --integration-options="--skills"

File / Directory Layout (top-level of the repo)

spec-kit/
├── src/
│   └── specify_cli/          # Python package
│       ├── integrations/     # One subpackage per supported AI agent
│       │   ├── base.py       # MarkdownIntegration, SkillsIntegration, etc.
│       │   ├── claude/
│       │   ├── copilot/
│       │   ├── gemini/
│       │   ├── codex/
│       │   ├── cursor_agent/
│       │   ├── windsurf/
│       │   └── ... (30+ agents)
│       ├── workflows/        # CLI workflow logic
│       └── agents.py
├── templates/
│   ├── commands/             # Slash-command prompt files (Markdown)
│   │   ├── specify.md
│   │   ├── plan.md
│   │   ├── tasks.md
│   │   ├── implement.md
│   │   ├── clarify.md
│   │   ├── analyze.md
│   │   ├── checklist.md
│   │   ├── constitution.md
│   │   └── taskstoissues.md
│   ├── spec-template.md
│   ├── plan-template.md
│   ├── tasks-template.md
│   ├── checklist-template.md
│   └── constitution-template.md
├── scripts/
│   ├── bash/                 # Shell helper scripts
│   └── powershell/           # PowerShell equivalents
├── extensions/               # Extension system docs + catalog
├── presets/                  # Preset system (lean, scaffold, self-test)
├── integrations/             # Integration catalog (catalog.json)
├── workflows/                # Workflow definitions
├── docs/
├── AGENTS.md
├── spec-driven.md            # Full SDD methodology doc
└── pyproject.toml

Post-Init Project Directory Layout

After specify init, the project gets:

.specify/
├── memory/
│   └── constitution.md       # Project governing principles
├── scripts/
│   └── bash/
│       ├── check-prerequisites.sh
│       ├── create-new-feature.sh
│       ├── setup-plan.sh
│       └── setup-tasks.sh
└── templates/
    ├── spec-template.md
    ├── plan-template.md
    ├── tasks-template.md
    └── overrides/            # Project-local template overrides
specs/
└── <NNN>-<feature-name>/     # One directory per feature
    ├── spec.md
    ├── plan.md
    ├── tasks.md
    ├── data-model.md
    ├── research.md
    ├── quickstart.md
    ├── contracts/
    └── checklists/

Agent command files are written to agent-specific locations (e.g., .claude/commands/speckit.*.md for Claude Code, .github/copilot-instructions.md for Copilot, skill dirs for Codex).

Template Resolution Order (Priority Stack)

Priority Layer Location
1 (highest) Project-local overrides .specify/templates/overrides/
2 Presets .specify/presets/templates/
3 Extensions .specify/extensions/templates/
4 (lowest) Spec Kit core .specify/templates/

Templates resolved at runtime; extension/preset command files written at install time.

Required Dependencies

  • Python 3.11+
  • uv (recommended) or pipx
  • Git
  • A supported AI coding agent (see 07-target-tools.md)

Configuration Files

File Purpose
.specify/memory/constitution.md Project governing principles (created by /speckit.constitution)
.specify/extensions.yml Extension hook registrations
.specify/feature.json Points downstream commands to the active feature directory
.specify/init-options.json Branch numbering scheme (sequential or timestamp)
AGENTS.md Agent-readable integration architecture reference

No CLAUDE.md is generated by default; a CLAUDE-template.md is copied into .specify/templates/ and instantiated during the /speckit.plan step.

03

Components

Spec Kit — Components

Commands (Slash Commands)

9 commands total. All live in templates/commands/ and are copied to the agent's command directory by specify init.

Command Skill Equivalent Purpose
/speckit.constitution speckit-constitution Create or update .specify/memory/constitution.md with project governing principles
/speckit.specify speckit-specify Transform a natural-language feature description into a structured spec.md with user stories, requirements, and a quality checklist
/speckit.clarify speckit-clarify Interactive structured ambiguity reduction (up to 5 questions) before planning; records answers back into spec.md
/speckit.plan speckit-plan Generate plan.md, data-model.md, contracts/, research.md, and quickstart.md from the spec plus a stated tech stack
/speckit.tasks speckit-tasks Produce a dependency-ordered, phase-structured tasks.md from plan.md and design artifacts
/speckit.analyze speckit-analyze Read-only cross-artifact consistency analysis across spec.md / plan.md / tasks.md; outputs severity-graded findings table
/speckit.implement speckit-implement Execute all tasks in tasks.md in phase/dependency order; marks tasks complete ([X]) as they finish
/speckit.checklist speckit-checklist Generate domain-specific requirement-quality checklists ("unit tests for English") saved to checklists/
/speckit.taskstoissues speckit-taskstoissues Convert tasks.md items into GitHub Issues via the GitHub MCP server

Skills

Skills mode is an alternate installation variant available for agents that support it (e.g., Codex CLI with --skills). In skills mode the same 9 prompt files are installed as agent skill directories (speckit-<name>/SKILL.md) rather than flat .md command files. The skill names match the table above under "Skill Equivalent". The content is identical; only the delivery format differs.

Subagents

(none — Spec Kit does not define named subagent personas. The framework is single-agent; parallelism within commands like /speckit.plan is described as dispatching "research agents" but these are ad-hoc prompts to the same session, not registered subagent definitions.)

Hooks

Spec Kit defines a runtime hook system via .specify/extensions.yml (not Claude Code hooks). Each command checks for hooks.before_<command> and hooks.after_<command> keys in that YAML file. Hooks can be:

  • Optional (optional: true) — presented to the user as a suggested follow-up command
  • Mandatory (optional: false) — automatically executed by the agent before or after the command body

Hook event names (as keys in extensions.yml):

Hook Key Fires
hooks.before_specify Before /speckit.specify outline runs
hooks.after_specify After /speckit.specify reports completion
hooks.before_clarify Before /speckit.clarify outline runs
hooks.after_clarify After clarification questions complete
hooks.before_plan Before /speckit.plan outline runs
hooks.after_plan After plan artifacts are written
hooks.before_tasks Before task generation runs
hooks.after_tasks After tasks.md is written
hooks.before_analyze Before consistency analysis runs
hooks.after_analyze After analysis report is output
hooks.before_implement Before implementation execution starts
hooks.after_implement After all tasks complete
hooks.before_checklist Before checklist generation runs
hooks.after_checklist After checklist file is written
hooks.before_constitution Before constitution update runs
hooks.after_constitution After constitution is saved
hooks.before_taskstoissues Before GitHub issue creation starts
hooks.after_taskstoissues After issues are created

These are distinct from Claude Code's PreToolUse/PostToolUse hook system. Spec Kit does not ship any Claude Code settings.json hooks.

MCP Servers

/speckit.taskstoissues declares tools: ['github/github-mcp-server/issue_write'] in its frontmatter, meaning it requires the GitHub MCP server to be available. No other MCP servers are bundled or required.

Scripts / Binaries

specify — the Specify CLI, the primary binary shipped by the package. Key subcommands:

Subcommand Purpose
specify init <project> Bootstrap a project with templates and agent integration files
specify integration list List all available agent integrations
specify extension search Search the extensions catalog
specify extension add <name> Install an extension into the project
specify preset search Search the presets catalog
specify preset add <name> Install a preset into the project

Shell helper scripts (copied to .specify/scripts/bash/ and .specify/scripts/powershell/ at init time):

Script Purpose
check-prerequisites.sh Verify active feature context; return JSON with paths
create-new-feature.sh / setup-plan.sh / setup-tasks.sh Set up directory structures and copy templates for each pipeline phase

PowerShell equivalents ship for Windows users.

05

Prompts

Spec Kit — Prompts (Verbatim Excerpts)

This file quotes verbatim the most important prompt files from templates/commands/. All files were fetched from raw.githubusercontent.com/github/spec-kit/main/templates/commands/.


Prompt 1 — /speckit.specify (templates/commands/specify.md) — First ~120 lines

This is the most architecturally significant prompt. It demonstrates: frontmatter handoffs, pre/post extension hook checking, automated feature directory naming, spec quality validation, and a hard limit of 3 [NEEDS CLARIFICATION] markers.

---
description: Create or update the feature specification from a natural language feature description.
handoffs: 
  - label: Build Technical Plan
    agent: speckit.plan
    prompt: Create a plan for the spec. I am building with...
  - label: Clarify Spec Requirements
    agent: speckit.clarify
    prompt: Clarify specification requirements
    send: true
---

## User Input

```text
$ARGUMENTS

You MUST consider the user input before proceeding (if not empty).

Pre-Execution Checks

Check for extension hooks (before specification):

  • Check if .specify/extensions.yml exists in the project root.
  • If it exists, read it and look for entries under the hooks.before_specify key
  • If the YAML cannot be parsed or is invalid, skip hook checking silently and continue normally
  • Filter out hooks where enabled is explicitly false. Treat hooks without an enabled field as enabled by default.
  • For each remaining hook, do not attempt to interpret or evaluate hook condition expressions:
    • If the hook has no condition field, or it is null/empty, treat the hook as executable
    • If the hook defines a non-empty condition, skip the hook and leave condition evaluation to the HookExecutor implementation
  • For each executable hook, output the following based on its optional flag: [...]

Outline

...

  1. Generate a concise short name (2-4 words) for the feature:
    • Analyze the feature description and extract the most meaningful keywords
    • Create a 2-4 word short name that captures the essence of the feature
    • Use action-noun format when possible (e.g., "add-user-auth", "fix-payment-bug")
    • Preserve technical terms and acronyms (OAuth2, API, JWT, etc.)

...

  1. Create the spec feature directory:

    Specs live under the default specs/ directory unless the user explicitly provides SPECIFY_FEATURE_DIRECTORY.

    Resolution order for SPECIFY_FEATURE_DIRECTORY:

    1. If the user explicitly provided SPECIFY_FEATURE_DIRECTORY (e.g., via environment variable, argument, or configuration), use it as-is
    2. Otherwise, auto-generate it under specs/:
      • Check .specify/init-options.json for branch_numbering
      • If "timestamp": prefix is YYYYMMDD-HHMMSS (current timestamp)
      • If "sequential" or absent: prefix is NNN (next available 3-digit number after scanning existing directories in specs/)
      • Construct the directory name: <prefix>-<short-name> (e.g., 003-user-auth or 20260319-143022-user-auth)
      • Set SPECIFY_FEATURE_DIRECTORY to specs/<directory-name>

...

  1. Follow this execution flow:
    1. Parse user description from arguments If empty: ERROR "No feature description provided"
    2. Extract key concepts from description Identify: actors, actions, data, constraints
    3. For unclear aspects:
      • Make informed guesses based on context and industry standards
      • Only mark with [NEEDS CLARIFICATION: specific question] if:
        • The choice significantly impacts feature scope or user experience
        • Multiple reasonable interpretations exist with different implications
        • No reasonable default exists
      • LIMIT: Maximum 3 [NEEDS CLARIFICATION] markers total
      • Prioritize clarifications by impact: scope > security/privacy > user experience > technical details
    4. Fill User Scenarios & Testing section If no clear user flow: ERROR "Cannot determine user scenarios"
    5. Generate Functional Requirements Each requirement must be testable Use reasonable defaults for unspecified details (document assumptions in Assumptions section)
    6. Define Success Criteria Create measurable, technology-agnostic outcomes Include both quantitative metrics (time, performance, volume) and qualitative measures (user satisfaction, task completion) Each criterion must be verifiable without implementation details

...

  1. Specification Quality Validation: After writing the initial spec, validate it against quality criteria:

    a. Create Spec Quality Checklist: Generate a checklist file at SPECIFY_FEATURE_DIRECTORY/checklists/requirements.md ...

    b. Run Validation Check: Review the spec against each checklist item:

    • For each item, determine if it passes or fails
    • Document specific issues found (quote relevant spec sections)

    c. Handle Validation Results:

    • If all items pass: Mark checklist complete and proceed to step 8
    • If items fail (excluding [NEEDS CLARIFICATION]):
      1. List the failing items and specific issues
      2. Update the spec to address each issue
      3. Re-run validation until all items pass (max 3 iterations)
      4. If still failing after 3 iterations, document remaining issues in checklist notes and warn user
    • If [NEEDS CLARIFICATION] markers remain: ...Present all questions together before waiting for responses Wait for user to respond with their choices for all questions (e.g., "Q1: A, Q2: Custom - [details], Q3: B") Update the spec by replacing each [NEEDS CLARIFICATION] marker with the user's selected or provided answer

**Technique highlights:**
- Frontmatter `handoffs` define which agents the user can chain to next (rendered as buttons in supporting UIs).
- Extension hooks are checked at both start and end of every command — the hook system is woven into every prompt.
- Hard constraint: maximum 3 clarification markers forces the LLM to make informed guesses rather than asking endlessly.
- Iterative self-validation loop: the LLM re-runs quality checks up to 3 times before surfacing issues to the user.

---

## Prompt 2 — `/speckit.clarify` (`templates/commands/clarify.md`) — First ~80 lines

This prompt demonstrates a sophisticated interactive clarification loop with a taxonomy-driven ambiguity scan, sequential one-question-at-a-time UX, and mandatory spec file updates after each accepted answer.

```markdown
---
description: Identify underspecified areas in the current feature spec by asking up to 5 highly targeted clarification questions and encoding answers back into the spec.
handoffs: 
  - label: Build Technical Plan
    agent: speckit.plan
    prompt: Create a plan for the spec. I am building with...
scripts:
   sh: scripts/bash/check-prerequisites.sh --json --paths-only
   ps: scripts/powershell/check-prerequisites.ps1 -Json -PathsOnly
---

...

2. Load the current spec file. Perform a structured ambiguity & coverage scan using this taxonomy. For each category, mark status: Clear / Partial / Missing. Produce an internal coverage map used for prioritization (do not output raw map unless no questions will be asked).

   Functional Scope & Behavior:
   - Core user goals & success criteria
   - Explicit out-of-scope declarations
   - User roles / personas differentiation

   Domain & Data Model:
   - Entities, attributes, relationships
   - Identity & uniqueness rules
   - Lifecycle/state transitions
   - Data volume / scale assumptions

   Interaction & UX Flow:
   - Critical user journeys / sequences
   - Error/empty/loading states
   - Accessibility or localization notes

   Non-Functional Quality Attributes:
   - Performance (latency, throughput targets)
   - Scalability (horizontal/vertical, limits)
   - Reliability & availability (uptime, recovery expectations)
   - Observability (logging, metrics, tracing signals)
   - Security & privacy (authN/Z, data protection, threat assumptions)
   - Compliance / regulatory constraints (if any)

   ...

3. Generate (internally) a prioritized queue of candidate clarification questions (maximum 5). Do NOT output them all at once. Apply these constraints:
    - Maximum of 5 total questions across the whole session.
    - Each question must be answerable with EITHER:
       - A short multiple‑choice selection (2–5 distinct, mutually exclusive options), OR
       - A one-word / short‑phrase answer (explicitly constrain: "Answer in <=5 words").
    - Only include questions whose answers materially impact architecture, data modeling, task decomposition, test design, UX behavior, operational readiness, or compliance validation.

4. Sequential questioning loop (interactive):
    - Present EXACTLY ONE question at a time.
    - For multiple‑choice questions:
       - **Analyze all options** and determine the **most suitable option** based on:
          - Best practices for the project type
          - Common patterns in similar implementations
          - Risk reduction (security, performance, maintainability)
          - Alignment with any explicit project goals or constraints visible in the spec
       - Present your **recommended option prominently** at the top with clear reasoning (1-2 sentences explaining why this is the best choice).
       - Format as: `**Recommended:** Option [X] - <reasoning>`
       - Then render all options as a Markdown table: ...
    - After the user answers:
       - If the user replies with "yes", "recommended", or "suggested", use your previously stated recommendation/suggestion as the answer.

5. Integration after EACH accepted answer (incremental update approach):
    - ...For the first integrated answer in this session:
       - Ensure a `## Clarifications` section exists (create it just after the highest-level contextual/overview section per the spec template if missing).
       - Under it, create (if not present) a `### Session YYYY-MM-DD` subheading for today.
    - Append a bullet line immediately after acceptance: `- Q: <question> → A: <final answer>`.
    - Then immediately apply the clarification to the most appropriate section(s): [...]
    - Save the spec file AFTER each integration to minimize risk of context loss (atomic overwrite).

Technique highlights:

  • Taxonomy-driven scan forces the LLM to audit 10 distinct categories before deciding what to ask.
  • The "recommend first" pattern surfaces a best-practice suggestion before showing options, reducing the decision burden on users.
  • Incremental file writes after every accepted answer prevent context loss and ensure partial progress is never lost.
  • The "yes"/"recommended"/"suggested" shortcut is an explicit UX pattern to reduce user friction.

Prompt 3 — /speckit.checklist (templates/commands/checklist.md) — Core Concept Section

This prompt introduces the "unit tests for English" framing — the most philosophically distinctive piece of Spec Kit's approach to requirements quality.

## Checklist Purpose: "Unit Tests for English"

**CRITICAL CONCEPT**: Checklists are **UNIT TESTS FOR REQUIREMENTS WRITING** - they validate the quality, clarity, and completeness of requirements in a given domain.

**NOT for verification/testing**:

- ❌ NOT "Verify the button clicks correctly"
- ❌ NOT "Test error handling works"
- ❌ NOT "Confirm the API returns 200"
- ❌ NOT checking if code/implementation matches the spec

**FOR requirements quality validation**:

- ✅ "Are visual hierarchy requirements defined for all card types?" (completeness)
- ✅ "Is 'prominent display' quantified with specific sizing/positioning?" (clarity)
- ✅ "Are hover state requirements consistent across all interactive elements?" (consistency)
- ✅ "Are accessibility requirements defined for keyboard navigation?" (coverage)
- ✅ "Does the spec define what happens when logo image fails to load?" (edge cases)

**Metaphor**: If your spec is code written in English, the checklist is its unit test suite. You're testing whether the requirements are well-written, complete, unambiguous, and ready for implementation - NOT whether the implementation works.

...

**🚫 ABSOLUTELY PROHIBITED** - These make it an implementation test, not a requirements test:
- ❌ Any item starting with "Verify", "Test", "Confirm", "Check" + implementation behavior
- ❌ References to code execution, user actions, system behavior
- ❌ "Displays correctly", "works properly", "functions as expected"
- ❌ "Click", "navigate", "render", "load", "execute"
- ❌ Test cases, test plans, QA procedures
- ❌ Implementation details (frameworks, APIs, algorithms)

**✅ REQUIRED PATTERNS** - These test requirements quality:
- ✅ "Are [requirement type] defined/specified/documented for [scenario]?"
- ✅ "Is [vague term] quantified/clarified with specific criteria?"
- ✅ "Are requirements consistent between [section A] and [section B]?"
- ✅ "Can [requirement] be objectively measured/verified?"
- ✅ "Are [edge cases/scenarios] addressed in requirements?"
- ✅ "Does the spec define [missing aspect]?"

Technique highlights:

  • The "unit tests for English" metaphor is a precise, memorable constraint that prevents the most common failure mode (writing verification steps instead of requirement quality checks).
  • Prohibited patterns include specific banned verbs ("Verify", "Test", "Confirm", "Click", "navigate", "render") — this is a hard constraint list, not guidance.
  • The quality dimensions taxonomy (Completeness, Clarity, Consistency, Measurability, Coverage, Edge Case Coverage, NFRs, Dependencies, Ambiguities & Conflicts) mirrors the clarify command's ambiguity scan.

Prompt 4 — /speckit.analyze (templates/commands/analyze.md) — Operating Constraints + Detection Passes

## Operating Constraints

**STRICTLY READ-ONLY**: Do **not** modify any files. Output a structured analysis report. Offer an optional remediation plan (user must explicitly approve before any follow-up editing commands would be invoked manually).

**Constitution Authority**: The project constitution (`/memory/constitution.md`) is **non-negotiable** within this analysis scope. Constitution conflicts are automatically CRITICAL and require adjustment of the spec, plan, or tasks—not dilution, reinterpretation, or silent ignoring of the principle. If a principle itself needs to change, that must occur in a separate, explicit constitution update outside `__SPECKIT_COMMAND_ANALYZE__`.

...

### 4. Detection Passes (Token-Efficient Analysis)

Focus on high-signal findings. Limit to 50 findings total; aggregate remainder in overflow summary.

#### A. Duplication Detection
- Identify near-duplicate requirements
- Mark lower-quality phrasing for consolidation

#### B. Ambiguity Detection
- Flag vague adjectives (fast, scalable, secure, intuitive, robust) lacking measurable criteria
- Flag unresolved placeholders (TODO, TKTK, ???, `<placeholder>`, etc.)

#### C. Underspecification
- Requirements with verbs but missing object or measurable outcome
- User stories missing acceptance criteria alignment
- Tasks referencing files or components not defined in spec/plan

#### D. Constitution Alignment
- Any requirement or plan element conflicting with a MUST principle
- Missing mandated sections or quality gates from constitution

#### E. Coverage Gaps
- Requirements with zero associated tasks
- Tasks with no mapped requirement/story
- Success Criteria requiring buildable work (performance, security, availability) not reflected in tasks

#### F. Inconsistency
- Terminology drift (same concept named differently across files)
- Data entities referenced in plan but absent in spec (or vice versa)
- Task ordering contradictions (e.g., integration tasks before foundational setup tasks without dependency note)
- Conflicting requirements (e.g., one requires Next.js while other specifies Vue)

### 5. Severity Assignment

- **CRITICAL**: Violates constitution MUST, missing core spec artifact, or requirement with zero coverage that blocks baseline functionality
- **HIGH**: Duplicate or conflicting requirement, ambiguous security/performance attribute, untestable acceptance criterion
- **MEDIUM**: Terminology drift, missing non-functional task coverage, underspecified edge case
- **LOW**: Style/wording improvements, minor redundancy not affecting execution order

Technique highlights:

  • Hard read-only constraint prevents autonomous spec mutation — the agent is explicitly prohibited from editing during analysis.
  • Constitution violations are automatically CRITICAL with no escape hatch; this enforces governance at the prompt level.
  • Token-efficient analysis: 50-finding cap prevents unbounded output in large codebases.

Prompt 5 — /speckit.tasks — Task Checklist Format Rules (from templates/commands/tasks.md)

## Task Generation Rules

**CRITICAL**: Tasks MUST be organized by user story to enable independent implementation and testing.

### Checklist Format (REQUIRED)

Every task MUST strictly follow this format:

```text
- [ ] [TaskID] [P?] [Story?] Description with file path

Format Components:

  1. Checkbox: ALWAYS start with - [ ] (markdown checkbox)
  2. Task ID: Sequential number (T001, T002, T003...) in execution order
  3. [P] marker: Include ONLY if task is parallelizable (different files, no dependencies on incomplete tasks)
  4. [Story] label: REQUIRED for user story phase tasks only
    • Format: [US1], [US2], [US3], etc. (maps to user stories from spec.md)
    • Setup phase: NO story label
    • Foundational phase: NO story label
    • User Story phases: MUST have story label
    • Polish phase: NO story label
  5. Description: Clear action with exact file path

Examples:

  • ✅ CORRECT: - [ ] T001 Create project structure per implementation plan
  • ✅ CORRECT: - [ ] T005 [P] Implement authentication middleware in src/middleware/auth.py
  • ✅ CORRECT: - [ ] T012 [P] [US1] Create User model in src/models/user.py
  • ✅ CORRECT: - [ ] T014 [US1] Implement UserService in src/services/user_service.py
  • ❌ WRONG: - [ ] Create User model (missing ID and Story label)
  • ❌ WRONG: T001 [US1] Create model (missing checkbox)
  • ❌ WRONG: - [ ] [US1] Create User model (missing Task ID)
  • ❌ WRONG: - [ ] T001 [US1] Create model (missing file path)

**Technique highlights:**
- Machine-readable task format (T-prefixed IDs, `[P]` parallelism markers, `[USn]` story labels) enables the `/speckit.implement` command to parse and execute tasks deterministically.
- The explicit right/wrong examples with parenthetical explanations enforce the format more reliably than prose descriptions.
- `[P]` marker enables the implement command to execute independent tasks in parallel without requiring a separate dependency graph.
09

Uniqueness

Spec Kit — Uniqueness & Positioning

What Spec Kit Does That No Other Seed Framework Does

1. First-party multi-agent CLI with 30+ integrations under one install. No other framework ships a Python CLI (specify) that writes the correct agent-specific command/skill files for 30+ tools from a single specify init command. Competitors are either single-agent (BMAD = Claude Code), require manual copy-paste, or are methodology docs without tooling.

2. The "unit tests for English" checklist system. The /speckit.checklist command enforces a strict distinction between requirement quality validation (what Spec Kit does) and implementation verification (what testing frameworks do). The banned-verb list ("Verify", "Test", "Confirm", "Click", "navigate", "render") and required-pattern list ("Are [X] defined for [Y]?", "Is [vague term] quantified?") is a novel prompting discipline not found in other spec frameworks.

3. Extension + preset layered override system with a community catalog. The four-layer template resolution stack (project-local overrides → presets → extensions → core) with specify extension add / specify preset add CLI commands and a published catalog.json is a unique packaging abstraction. No other framework in this category has an extensible plugin catalog.

4. Frontmatter handoffs in command files. The YAML frontmatter handoffs: key (present in specify.md, plan.md, tasks.md) defines the next-step agents and pre-filled prompts the UI can render as action buttons. This is a UI contract baked into each prompt file — an integration surface not seen in other markdown-prompt frameworks.

5. Cross-artifact consistency analysis as a standalone command. /speckit.analyze performs a multi-document semantic analysis (spec ↔ plan ↔ tasks ↔ constitution) with severity levels (CRITICAL/HIGH/MEDIUM/LOW), a 50-finding cap for token efficiency, and a hard read-only constraint. No other framework has an explicit pre-implementation consistency gate at this level of detail.

What Spec Kit Explicitly Drops

  • No multi-agent orchestration. Spec Kit is a single-agent sequential pipeline. There are no named subagents, no worker pools, no parallel agent spawning. (Compare: BMAD has Developer/Architect/PO agents; claude-flow has multi-agent swarms.)
  • No automated TDD enforcement. Tests are optional and only generated if explicitly requested. (Compare: BMAD and some other frameworks make test-first mandatory.)
  • No built-in git worktree isolation. Branch creation is delegated to an optional git extension. (Compare: some frameworks use worktrees by design for parallel feature development.)
  • No memory beyond flat files. No SQLite, no vector DB, no MCP-bridged memory. Pure filesystem.
  • No built-in research agents as first-class entities. "Research agents" in /speckit.plan are ad-hoc suggestions to the LLM, not registered agent definitions.

One-Sentence Positioning

Spec Kit is the official GitHub-backed spec-driven development toolkit that turns a natural-language feature description into a complete, versioned, AI-executable specification pipeline — installable for 30+ agents in one command — and is the only framework in the category with a layered extension/preset system and a cross-artifact consistency analyzer.

Failure Modes and Criticisms

From Reddit (sourced from _index/wave-2b-reddit-discovery.md):

  • Methodology skepticism: > "Spec-driven development for AI is a form of technical masturbation and frameworks like Spec-kit, bmad, Openspec are BS." — r/ChatGPTCoding. This reflects the broader criticism that heavyweight upfront specification conflicts with rapid iteration.
  • Reliability concerns: > "believe me, even the spec kit made by GitHub themselves fail." — r/ChatGPTCoding. Suggests that the pipeline can break at the LLM execution layer despite structured prompts.
  • Competition: > "Task master was alright but spec kit handles a lot of the context engineering and produces better task lists." — This is a positive comparison, but implies that users are actively evaluating alternatives and that the value prop is task-list quality, not the full SDD philosophy.
  • Complexity overhead: Community projects like "Flow (free framework)" were explicitly born from "Spec-Kit/Taskmaster frustrations," suggesting that the framework's multi-step pipeline feels heavyweight to some users.
  • Star inflation concern: The 105k star count is extraordinary for a toolkit of this type; the CITATION.cff and GitHub's own brand likely inflated discovery beyond organic adoption.

vs. Key Competitors

Dimension Spec Kit BMAD Method OpenSpec
Agent support 30+ via CLI Claude Code (primary) unknown
Commands 9 ~8 unknown
Extension system Yes (catalog) No No
Multi-agent No Yes (personas) unknown
TDD Optional Optional unknown
Maintainer GitHub (official) Community Community
Stars 105,966 ~20k range unknown
04

Workflow

Spec Kit — Workflow

Workflow Phases

# Phase Command Artifact(s) Produced
0 Govern /speckit.constitution .specify/memory/constitution.md
1 Specify /speckit.specify <description> specs/<NNN>-<name>/spec.md, checklists/requirements.md
2 Clarify (optional, recommended) /speckit.clarify Updated spec.md (clarifications appended)
3 Plan /speckit.plan <tech stack> plan.md, data-model.md, contracts/, research.md, quickstart.md
4 Tasks /speckit.tasks tasks.md
5 Analyze (optional) /speckit.analyze Console report (no files written — read-only)
6 Checklist (optional) /speckit.checklist <domain> checklists/<domain>.md
7 Implement /speckit.implement All feature source files; tasks marked [X] in tasks.md
8 Track (optional) /speckit.taskstoissues GitHub Issues created in the remote repo

Human Approval Gates

The workflow has several explicit human approval gates:

  1. After /speckit.specify: The user reviews spec.md and the quality checklist. The command presents up to 3 [NEEDS CLARIFICATION] questions and waits for user responses before finalizing the spec.
  2. After /speckit.clarify: The command asks up to 5 sequential questions, one at a time, and waits for user responses. The session is explicitly interactive.
  3. After /speckit.plan: The README explicitly instructs the user to review research.md, validate the tech stack, and refine if needed before proceeding to tasks.
  4. Before /speckit.implement (if checklists exist): The implement command scans checklists/ and if any checklist item is incomplete, stops and asks: "Some checklists are incomplete. Do you want to proceed with implementation anyway? (yes/no)". This is the only hard gate that blocks execution.
  5. /speckit.analyze output: The analysis command produces a severity-graded findings table and asks "Would you like me to suggest concrete remediation edits?" It does not apply any changes automatically.

TDD Enforcement

Optional. The /speckit.tasks command explicitly states: "Tests are OPTIONAL: Only generate test tasks if explicitly requested in the feature specification or if user requests TDD approach." The /speckit.implement command does describe TDD order ("Follow TDD approach: Execute test tasks before their corresponding implementation tasks") but only when test tasks are present in tasks.md. The constitution template includes ### [PRINCIPLE_3_NAME]: III. Test-First (NON-NEGOTIABLE) as an example principle, but it is a placeholder the user must fill in — it is not enforced by default.

Evidence: tasks.md generation rules state — "Tests are OPTIONAL: Only generate test tasks if explicitly requested in the feature specification or if user requests TDD approach."

Multi-Agent Execution

No named multi-agent execution at the framework level. The /speckit.plan command describes dispatching "research agents" (parallel AI tasks for resolving technical unknowns) but this is a suggestion to the LLM within a single session, not a formal subagent definition. The extension system can add hooks that invoke other commands, enabling a loose chained-agent pattern, but this is not built in.

Git Worktrees / Isolated Workspaces

No native worktree isolation. The framework creates a Git branch per feature (via the before_specify hook in the git extension bundled with extensions/git/), and each feature gets its own specs/<NNN>-<name>/ directory. Branch creation is handled by an optional extension, not the core.

Spec Format

Markdown. All artifacts (spec.md, plan.md, tasks.md, data-model.md, research.md, quickstart.md, checklists) are Markdown files following explicit templates. Interface contracts under contracts/ may be JSON (e.g., api-spec.json) or Markdown depending on the project type.

Files Generated Per Feature

specs/<NNN>-<feature-name>/
├── spec.md                   # User stories + functional requirements + success criteria
├── plan.md                   # Technical architecture + phase breakdown
├── tasks.md                  # Dependency-ordered task list with [P] parallel markers
├── data-model.md             # Entities, relationships, validation rules
├── research.md               # Technical decisions with rationale + alternatives
├── quickstart.md             # Key validation scenarios / integration guide
├── contracts/
│   ├── api-spec.json         # (if project has external API)
│   └── <other-contracts>.md
└── checklists/
    ├── requirements.md       # Auto-generated by /speckit.specify
    └── <domain>.md           # Additional checklists from /speckit.checklist

Feature Numbering

By default, feature directories are numbered sequentially (001-feature-name, 002-...). An init-options.json setting of "timestamp" switches to YYYYMMDD-HHMMSS-feature-name prefix. The specify field value auto-increments by scanning existing specs/ subdirectories.

06

Memory Context

Spec Kit — Memory & Context

Memory Model

file-based

All persistent state is stored as Markdown and JSON files in the project filesystem. There is no database, vector store, or external memory service.

Persistence Scope

project

Memory is scoped to the project directory. Nothing is stored globally (outside the project) or in a cloud service.

Primary Memory Files

File Contents Created By
.specify/memory/constitution.md Project governing principles; referenced by every subsequent command /speckit.constitution
.specify/feature.json Points to the active feature directory (specs/<NNN>-<name>/) /speckit.specify
.specify/init-options.json Branch numbering scheme; agent initialization preferences specify init
specs/<NNN>-<name>/spec.md Feature requirements, user stories, clarifications history /speckit.specify, /speckit.clarify
specs/<NNN>-<name>/plan.md Technical architecture and implementation plan /speckit.plan
specs/<NNN>-<name>/tasks.md Task list; tasks marked [X] as completed /speckit.tasks, /speckit.implement
specs/<NNN>-<name>/research.md Technical decisions with rationale and alternatives /speckit.plan

Cross-Session Handoffs

Yes — the framework is explicitly designed for cross-session continuity:

  1. Feature pointer: .specify/feature.json stores the resolved feature_directory path. Any subsequent command (in a new session) runs check-prerequisites.sh --json to read this file and locate all artifacts without relying on git branch name conventions.

  2. Clarification history: The /speckit.clarify command appends a ## Clarifications / ### Session YYYY-MM-DD section to spec.md with every accepted answer. This provides a timestamped audit trail readable in any future session.

  3. Task completion state: /speckit.implement marks completed tasks as [X] in tasks.md. A new session resuming implementation reads tasks.md and picks up from the first uncompleted task.

  4. Constitution: .specify/memory/constitution.md is loaded by /speckit.plan, /speckit.analyze, and /speckit.implement as an authoritative constraint document. Its version header (Version: X.Y.Z | Ratified: YYYY-MM-DD | Last Amended: YYYY-MM-DD) provides change history at a glance.

Context Compaction Strategy

No specific /compact or context-window-management strategy is described. The framework's mitigation is artifact externalization: all critical context (spec, plan, tasks, constitution) lives in files that commands reload from disk at the start of each invocation rather than relying on conversation history. This means context loss between sessions is handled by re-reading files, not by conversation summarization.

The /speckit.analyze command explicitly notes "Context Efficiency" as an operating principle: "Load artifacts incrementally; don't dump all content into analysis — prefer summarizing long sections into concise scenario/requirement bullets."

Memory Bank / Knowledge Base

The framework refers to the .specify/memory/ directory as its "memory" directory. The constitution.md file is the closest analog to a persistent knowledge base — it is the single file that governs all downstream generation and is explicitly checked for conflicts during every analysis pass.

There is no reference to external "memory bank" tools, RAG pipelines, or knowledge graph integrations.

07

Target Tools

Spec Kit — Target Tools

Supported AI Coding Agent Integrations

Spec Kit officially supports 30+ AI coding agents as confirmed by the README ("Spec Kit works with 30+ AI coding agents") and the integration registry in src/specify_cli/integrations/. The specify integration list command shows all available integrations in the installed version.

Integrations are discovered via gh api /repos/github/spec-kit/contents/src/specify_cli/integrations:

Key Agent Install Path (command files) Format Notes
claude Claude Code .claude/commands/speckit.*.md SkillsIntegration Supports both slash commands and --skills mode
copilot GitHub Copilot .github/copilot-instructions.md + agent files Custom Default integration when no --integration flag is passed in non-interactive CI
codex OpenAI Codex CLI Skill directories Skills Skills mode (--integration-options="--skills") uses $speckit-* syntax instead of /speckit.*
gemini Gemini CLI TOML format TomlIntegration
cursor_agent Cursor Agent files MarkdownIntegration
windsurf Windsurf .windsurf/workflows/ MarkdownIntegration .windsurf/rules/specify-rules.md as context file
roo Roo Markdown commands MarkdownIntegration
agy AGY Markdown MarkdownIntegration
amp Amp Markdown MarkdownIntegration
auggie Auggie Markdown MarkdownIntegration
bob Bob Markdown MarkdownIntegration
codebuddy CodeBuddy Markdown MarkdownIntegration
devin Devin Markdown MarkdownIntegration
forge Forge Markdown MarkdownIntegration
generic Generic (fallback) Markdown MarkdownIntegration
goose Goose Markdown MarkdownIntegration
iflow iFlow Markdown MarkdownIntegration
junie Junie Markdown MarkdownIntegration
kilocode Kilo Code Markdown MarkdownIntegration
kimi Kimi Markdown MarkdownIntegration
kiro_cli Kiro CLI Markdown MarkdownIntegration Key retains hyphen: kiro-cli
lingma Lingma Markdown MarkdownIntegration
opencode opencode Markdown MarkdownIntegration
pi Pi Markdown MarkdownIntegration
qodercli Qoder CLI Markdown MarkdownIntegration
qwen Qwen Markdown MarkdownIntegration
shai Shai Markdown MarkdownIntegration
tabnine Tabnine Markdown MarkdownIntegration
trae Trae Markdown MarkdownIntegration
vibe Vibe Markdown MarkdownIntegration

Install Path Differences

  • Claude Code: Commands go to .claude/commands/speckit.*.md. Skills mode creates speckit-<name>/SKILL.md directories instead.
  • Codex CLI: Skills mode (--integration-options="--skills") uses $speckit-* command syntax rather than /speckit.*.
  • Gemini CLI: Uses TOML format (.toml extension) instead of Markdown.
  • Windsurf: Commands go to .windsurf/workflows/; a context/rules file is created at .windsurf/rules/specify-rules.md.
  • Copilot: Custom integration with more complex output (companion files, settings merge logic); default in CI environments.
  • Generic: Fallback for agents not explicitly listed; uses Markdown commands with no agent-specific customization.

CLI Detection

The README documents that specify init checks whether the agent CLI is installed:

"The CLI will check if you have Claude Code, Gemini CLI, Cursor CLI, Qwen CLI, opencode, Codex CLI, Qoder CLI, Tabnine CLI, Kiro CLI, Pi, Forge, Goose, or Mistral Vibe installed. If you do not, or you prefer to get the templates without checking for the right tools, use --ignore-agent-tools with your command."

For IDE-based integrations (Windsurf, Cursor), requires_cli: False is set in the integration config and no CLI check is performed.

Primary Tool

The framework was originally created with Claude Code as the primary target (the detailed walkthrough in the README uses Claude Code as the reference implementation), but it is explicitly multi-agent and the copilot integration is the non-interactive default.

08

Signals

Spec Kit — Signals

GitHub Metrics

Metric Value Source
Stars 105,966 gh api /repos/github/spec-kit (fetched 2026-05-26)
Forks 9,379 gh api /repos/github/spec-kit
Contributors 30 gh api /repos/github/spec-kit/contributors
Last commit (pushed_at) 2026-05-22 gh api /repos/github/spec-kit
License MIT repo metadata
Default branch main repo metadata

Maintainer Status

Active. Last push was 2026-05-22 (4 days before this analysis). The project has a CHANGELOG.md, a CITATION.cff (indicating academic/formal project governance), and a newsletters/ directory — all signs of active, organized maintenance. The extension and preset ecosystem also shows ongoing community development.

Reddit Sentiment

Mixed. The following quotes are from _index/wave-2b-reddit-discovery.md:

  • Positive: > "Spec-Kit is amazing. Learn how it works under the hood and then start modding and customizing it." — r/AI_Agents (1p5778u)
  • Positive: > "Task master was alright but spec kit handles a lot of the context engineering and produces better task lists." — r/ChatGPTCoding (1nwcwoz)
  • Critical: > "Spec-driven development for AI is a form of technical masturbation and frameworks like Spec-kit, bmad, Openspec are BS." — r/ChatGPTCoding (1o6j1yr)
  • Critical: > "believe me, even the spec kit made by GitHub themselves fail." — r/ChatGPTCoding (1ne5nu8)

Community threads comparing Spec Kit directly with competitors (BMAD, OpenSpec, Agent OS, Superpowers) appear across r/ClaudeCode, r/ClaudeAI, and r/ChatGPTCoding, indicating high community awareness and active debate.

HN Sentiment

unknown — Two HN threads are referenced in the index (id=48091699 "Spec-Driven Development with Spec Kit and CC" and id=45577377 "What's the Deal with GitHub Spec Kit") but their content was not scraped in the available discovery files.

Community Ecosystem Signals

The ecosystem around Spec Kit is notably large:

  • acnlabs/awesome-spec-kits — community-curated list of Spec Kit derivatives
  • wolffy-au/spec-kit-aider — SDD toolkit port for Aider
  • JRedeker/cline-spec-kit-workflows — Cline workflow files for spec-kit
  • spec-kit-command-cursor — multiple Cursor ports (madebyaris, foxgod183)
  • spec-kitty (Priivacy-ai) — "SDD for serious developers" derivative
  • IBM/iac-spec-kit — IaC-to-spec AI workflows (1057 stars)
  • MetaSpec — spec-toolkit framework (944 stars)
  • Community extensions catalog and presets catalog maintained in the main repo
  • A CITATION.cff file enables academic citation, suggesting institutional adoption interest

Related frameworks

same archetype · same primary tool · same memory type

OpenSpec ★ 51k

Adds a lightweight spec layer so AI coding assistants and humans agree on what to build before any code is written.

ECC (Everything Claude Code) ★ 193k

Comprehensive harness-native operator system: 246 skills + 61 agents + continuous learning hooks + multi-model routing across 8…

Gemini CLI (Google) ★ 105k

Bring the full power of Gemini into the terminal with a free tier, Google Search grounding, and extensible MCP support.

OpenAI Codex CLI ★ 86k

Give developers a sandboxed, locally-running OpenAI coding agent with approval gates and skill orchestration.

cursorrules v5 (kinopeee) ★ 1.1k

Bilingual (ja/en) Cursor rule set with tricolor task classification, security-first prompt injection defense, and structured git…

windsurfrules v5 (kinopeee) ★ 364

Windsurf/Antigravity port of cursorrules v5 — same tricolor task classification and injection defense, translated to Windsurf's…