spec-compare (cameronsjo)

spec-compare · cameronsjo/spec-compare · ★ 39 · last commit 2026-02-25

Primitive shape

No installable primitives

Summary

spec-compare — Summary

spec-compare is a research and comparison documentation repository, not a deployable AI coding framework. It contains in-depth analysis of spec-driven development tools for AI-assisted coding, organized as 13 Markdown documents covering comparison matrices, 12-scenario use-case scoring, git worktree analysis, a landscape survey of 30+ tools, and practical decision frameworks. The original six tools compared are Spec-Kit, Spec Kitty, BMad Method, OpenSpec, Kiro, and Tessl; a February 2026 extension adds GSD, Ralph Loop, Zencoder, Kilo Code, Conductor, and PromptX. The repository's primary original contribution is the "SDD Maturity Levels" taxonomy — Spec-First (specs precede code but are discarded), Spec-Anchored (specs persist and evolve), and Spec-as-Source (only specs are edited, code auto-generates) — which provides a conceptual framework for categorizing all tools in the SDD space. With 39 stars and 2 contributors, spec-compare has more adoption signal than most deployable frameworks in this batch, reflecting its utility as a curated research entry point for practitioners choosing between SDD tools.

differs_from_seeds

spec-compare has no relationship to any seed framework — it does not extend, fork, or build on any of the five archetypes. It is a pure methodology documentation artifact: no commands, no skills, no agents, no MCP servers, no hooks, and no install script. Its value is analytical rather than operational: it tells you which seed-class framework to choose, rather than being a framework itself. In this batch it is the only Tier C item — it does not fit the "AI coding agent framework" definition, but it is included because it is the most comprehensive publicly available comparison of the exact tool category this research project covers.

Overview

spec-compare — Overview

Origin

spec-compare was created by cameronsjo with a second contributor, last updated 2026-02-25. It is MIT-licensed with 39 stars. The repository has no source code — it is entirely documentation.

Purpose

From the README:

A comprehensive research and comparison of spec-driven development (SDD) tools for AI-assisted coding, including analysis of git worktree support, architectural approaches, and practical recommendations.

The repository was originally created to compare six tools (Spec-Kit, Spec Kitty, BMad Method, OpenSpec, Kiro, Tessl) and was extended in February 2026 to add GSD, Ralph Loop, Zencoder, Kilo Code, Conductor, and PromptX.

Primary Original Contribution

The "SDD Maturity Levels" taxonomy, from the README:

Spec-First: Specs precede coding but are discarded (Spec-Kit, Kiro, BMad)

Spec-Anchored: Specs persist and evolve (OpenSpec, Spec Kitty)

Spec-as-Source: Only specs are edited, code auto-generates (Tessl)

Key Findings

From the README:

Critical Gap: Most SDD tools excel when requirements are clear upfront but struggle with iterative changes like "change button from blue to green."

OpenSpec — Purpose-built for modifications with delta format (ADDED, MODIFIED, REMOVED)

Tessl — Spec-as-source enables edit-and-regenerate (but closed beta)

Spec-Kit — Requires /speckit.clarify workaround, not optimized for small changes

Kiro/BMad — "Sledgehammer to crack a nut" problem for trivial changes

Spec Kitty is the only tool with built-in git worktree support, enabling:

Automatic worktree creation per feature

Parallel feature isolation without branch switching

Automated cleanup on merge

Key finding: OpenSpec v1.0 is the only tool that migrated from AGENTS.md/CLAUDE.md to the newer SKILL.md standard. All other open-source tools still generate AGENTS.md. The AGENTS.md standard itself (28.64% runtime reduction in evaluations) continues to gain adoption — OpenAI Codex ships 88 AGENTS.md files in its own repo.

Architecture

spec-compare — Architecture

Distribution Type

Methodology documentation (static Markdown files). No deployable artifact.

Repository Structure

spec-compare/
├── README.md                        # Entry point with key findings, quick comparison table, recommendations
├── CHANGELOG.md                     # Version history of the research
├── CONTRIBUTING.md                  # How to contribute findings
├── LICENSE                          # MIT
└── docs/
    ├── comparison.md                # Side-by-side feature comparison matrices
    ├── use-case-scoring.md          # 12-scenario graded scoring (5-star scale)
    ├── iterative-development.md     # Spec modification workflows analysis
    ├── git-worktree-support.md      # Detailed worktree analysis (updated with Beads, Conductor)
    ├── recommendations.md           # Decision frameworks by use case
    ├── critical-analysis.md         # Concerns, critiques, future outlook
    ├── landscape.md                 # 30+ multi-agent tools surveyed
    ├── beads.md                     # Agent memory, messaging, multi-agent villages
    ├── gaps.md                      # Zencoder, Kilo Code, Conductor, PromptX analyses
    ├── cheatsheet-beads-openspec.md # Practical Beads + OpenSpec setup cheatsheet
    ├── sources.md                   # All citations and references
    ├── research-beads-agentmail-for-skill-update.md  # Research notes
    └── tools/
        ├── spec-kit.md              # Individual profile
        ├── spec-kitty.md            # Individual profile
        ├── bmad-method.md           # Individual profile
        ├── openspec.md              # Individual profile
        ├── kiro.md                  # Individual profile
        ├── tessl.md                 # Individual profile
        ├── gsd.md                   # Individual profile
        └── ralph-loop.md            # Individual profile

No Install, No Runtime, No Configuration

spec-compare has no install method, no runtime dependencies, and no configuration files. It is read directly on GitHub or cloned for offline reference.

Components

spec-compare — Components

spec-compare contains no executable components. All content is static Markdown research documentation.

Document Index

Document	Purpose
`docs/comparison.md`	Side-by-side feature matrices comparing 6 tools on 20+ dimensions, plus agent configuration support table
`docs/use-case-scoring.md`	12 real-world scenarios graded on 5-star scale for each tool; expanded 11-tool heatmap
`docs/iterative-development.md`	Analysis of how each tool handles spec modifications vs. greenfield
`docs/git-worktree-support.md`	Detailed git worktree capability analysis including Beads and Conductor
`docs/recommendations.md`	Decision frameworks: by project type, team size, complexity, workflow preference
`docs/critical-analysis.md`	Honest concerns and critiques about each tool; future outlook
`docs/landscape.md`	30+ multi-agent tools surveyed including Claude Code Agent Teams
`docs/beads.md`	Analysis of Beads, Agent Mail, and Gas Town (agent memory and messaging ecosystem)
`docs/gaps.md`	Analysis of Zencoder, Kilo Code, Conductor, PromptX
`docs/cheatsheet-beads-openspec.md`	Practical daily workflow cheatsheet for Beads + OpenSpec combination
`docs/sources.md`	All citations and references
`docs/tools/spec-kit.md`	Spec-Kit individual profile
`docs/tools/spec-kitty.md`	Spec Kitty individual profile
`docs/tools/bmad-method.md`	BMad Method individual profile
`docs/tools/openspec.md`	OpenSpec individual profile
`docs/tools/kiro.md`	Kiro individual profile
`docs/tools/tessl.md`	Tessl individual profile
`docs/tools/gsd.md`	GSD individual profile
`docs/tools/ralph-loop.md`	Ralph Loop individual profile

Tools Analyzed (12 total)

Original Six

GitHub Spec-Kit — Open-source CLI toolkit for greenfield projects
Spec Kitty — Community fork with built-in git worktree orchestration
BMad Method — Enterprise framework with 21 specialized AI agents
OpenSpec — Lightweight change-management for brownfield projects
Kiro — AWS-backed agentic IDE with multimodal input
Tessl — Experimental spec-as-source platform

February 2026 Extensions

GSD — Meta-prompting SDD system with wave-based context management (11.9K stars)
Ralph Loop — Stateless iterative execution pattern by Geoffrey Huntley
Zencoder/Zenflow — Commercial SDD-as-a-Service platform
Kilo Code — Open-source agentic platform with Memory Bank ($8M seed, 1.5M users)
Conductor — macOS parallel agent runner using git worktrees
PromptX — AI agent context platform via MCP (gap entry)

Prompts

spec-compare — Prompts

spec-compare contains no prompt files, agent definitions, or skill documents. It is a research comparison repository. The sections below quote the most substantive analytical text from the documentation.

Verbatim: Use-case scoring for trivial modification

From docs/use-case-scoring.md:

## Use Case 1: Change Button Color (Trivial Modification)

**Scenario:** Change primary button background from blue (#0000FF) to green (#00FF00)

| Tool | Score | Reasoning |
|------|-------|-----------|
| **OpenSpec** | ⭐⭐⭐⭐⭐ | Lightweight delta format. Create `changes/button-color/` with MODIFIED section. Minimal overhead. |
| **Tessl** | ⭐⭐⭐⭐ | Edit spec directly, regenerate code. Elegant but closed beta limits access. |
| **Spec-Kit** | ⭐⭐⭐ | Must use `/speckit.clarify` workaround. `/speckit.specify` wants to create new feature. |
| **Spec Kitty** | ⭐⭐ | Same Spec-Kit issues + worktree overhead excessive for one-line change. |
| **Kiro** | ⭐⭐ | Generates 5-page spec for trivial change. "Sledgehammer to crack a nut" problem. |
| **BMad** | ⭐ | 21 agents, 30+ minute workflow for button color. Massive overkill. |

**Winner:** OpenSpec
**Avoid:** BMad, Kiro

Verbatim: Feature matrix excerpt (comparison.md)

From docs/comparison.md:

| Feature | Spec-Kit | Spec Kitty | BMad | OpenSpec | Kiro | Tessl |
|---------|----------|------------|------|----------|------|-------|
| **Spec Maturity Level** | Spec-First | Spec-Anchored | Spec-First | Spec-Anchored | Spec-First | Spec-as-Source |
| **Git Worktrees** | No | **Yes** | No | No | No | No |
| **AGENTS.md** | ✅ Generated | ✅ Generated | ✅ Generated | ⚠️ Pre-1.0 only | ❌ | ❌ |
| **SKILL.md** | ❌ | ❌ | ❌ | ✅ v1.0 | ❌ | ❌ |
| **Slash Commands** | ✅ 8 | ✅ 13 | ✅ 50+ workflows | ✅ 10 (`/opsx:`) | ❌ | ❌ |
| **MCP Support** | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ |
| **Multi-Agent** | ❌ | ✅ | ✅ | ❌ | ⚠️ | ❌ |
| **Dashboard** | ❌ | ✅ | ❌ | ⚠️ | ✅ (IDE) | ❌ |

Verbatim: AGENTS.md finding (comparison.md)

From docs/comparison.md:

**Key finding:** OpenSpec v1.0 is the only tool that migrated from AGENTS.md/CLAUDE.md to the 
newer SKILL.md standard. All other open-source tools still generate AGENTS.md. The AGENTS.md 
standard itself (28.64% runtime reduction in evaluations) continues to gain adoption — OpenAI 
Codex ships 88 AGENTS.md files in its own repo.

Verbatim: Modification problem (README)

From README.md:

### The Modification Problem

**Critical Gap:** Most SDD tools excel when requirements are clear upfront but struggle with 
iterative changes like "change button from blue to green."

- **OpenSpec** - Purpose-built for modifications with delta format (ADDED, MODIFIED, REMOVED)
- **Tessl** - Spec-as-source enables edit-and-regenerate (but closed beta)
- **Spec-Kit** - Requires `/speckit.clarify` workaround, not optimized for small changes
- **Kiro/BMad** - "Sledgehammer to crack a nut" problem for trivial changes

Uniqueness

spec-compare — Uniqueness and Positioning

differs_from_seeds

spec-compare has no relationship to any of the five seed archetypes — it does not extend, fork, or build on any seed framework. It is a pure methodology documentation artifact: a curated research collection designed to help practitioners choose between seed-class frameworks. No other repository in this batch serves this function. It is the only Tier C item in batch-01: included for completeness (and because its 39 stars reflect genuine practitioner adoption as a reference resource) but not analyzable as a deployable framework.

Positioning

Category: Research/comparison documentation — NOT a deployable AI coding framework
Primary value: Curated cross-tool analysis with graded use-case scoring and the "SDD Maturity Levels" taxonomy
Unique contribution: The 3-level SDD Maturity taxonomy (Spec-First / Spec-Anchored / Spec-as-Source) provides conceptual vocabulary for categorizing every other framework in this research project
Scope: Covers the widest span of any single document in the corpus — original 6 tools + 6 extended tools + 30+ landscape entries

Observable Limitations

Not a framework: Cannot be installed, configured, or run. Zero operational value for practitioners who already know which tool they want.
Point-in-time accuracy: Some entries (Kiro, Tessl) are based on preview/beta state as of early 2026 and may be stale.
Author bias unknown: Tool scoring reflects the author's priorities (brownfield/modification workflows ranked highly; this favors OpenSpec). Readers with greenfield priorities may weight the scoring differently.
No code evidence: The comparison relies on documentation claims rather than reproducible benchmarks. The "30+ minute workflow for button color" for BMad is an estimate, not a measurement.
Missing frameworks: Doesn't cover ralphy-openspec, goopspec, or the wave of OpenCode-targeted tools that emerged after February 2026.

Workflow

spec-compare — Workflow

spec-compare has no executable workflow. It is a reference artifact used to inform tool selection decisions.

How Practitioners Use It

Read README for key findings and quick comparison table
Consult docs/use-case-scoring.md to find tools that score highest for their specific scenario
Read docs/recommendations.md for decision frameworks by project type and team profile
Read individual tool profiles in docs/tools/ for deeper investigation
Consult docs/comparison.md for detailed feature matrix
Check docs/git-worktree-support.md if parallel development is a requirement
Check docs/iterative-development.md if brownfield/modification workflows are primary

No Phases, No Artifacts, No Gates

spec-compare produces no code, no specs, no artifacts. The "workflow" is reading documentation.

Decision Framework Summary (from docs/recommendations.md)

Scenario	Recommended Tool
Trivial modifications (button color, text change)	OpenSpec
Greenfield project with full planning	Spec-Kit or BMad
Parallel feature development	Spec Kitty
Enterprise multi-agent workflow	BMad
Spec-as-source (regenerate from spec)	Tessl
IDE-integrated experience	Kiro

SDD Maturity Taxonomy (Key Analytical Output)

The primary analytical contribution of spec-compare:

Level	Name	Definition	Examples
1	Spec-First	Specs precede coding but are discarded after use	Spec-Kit, Kiro, BMad
2	Spec-Anchored	Specs persist and evolve alongside code	OpenSpec, Spec Kitty
3	Spec-as-Source	Only specs are edited; code auto-generates	Tessl

Memory Context

spec-compare — Memory and Context

State Storage

None. spec-compare is static documentation. No state files, no session management, no persistence.

Context Approach

spec-compare is read as reference material prior to making tool selection decisions. It has no runtime context mechanism.

Cross-Session Handoff

Not applicable.

Memory Type

Not applicable — there is no executable component.

Orchestration

spec-compare — Orchestration

Multi-Agent

No. spec-compare has no agents, no orchestration, and no execution layer.

Orchestration Pattern

None.

Execution Mode

Not applicable. spec-compare is read as static documentation.

Note

spec-compare analyzes orchestration patterns in other tools — notably git worktree orchestration in Spec Kitty, 21-agent hierarchical orchestration in BMad, and parallel agent runners like Conductor. But it does not implement any orchestration itself.

Ui Cli Surface

spec-compare — UI, CLI, and Observability

Dedicated CLI Binary

None.

Local Web Dashboard

None.

IDE Integration

None.

Observability

Not applicable — spec-compare is static documentation with no runtime behavior to observe.

Access Method

Read directly on GitHub at https://github.com/cameronsjo/spec-compare, or clone locally with git clone.

Related frameworks

same archetype · same primary tool · same memory type

Context-Engineering Handbook ★ 9.0k

A13 Methodology

Provides a first-principles, research-grounded vocabulary and learning path for context engineering — the discipline of designing…

walkinglabs/learn-harness-engineering ★ 6.6k

A13 Methodology

Teach harness engineering from first principles (12 lectures + 6 projects) and provide a scaffolding skill (harness-creator) that…

Awesome Harness Engineering (walkinglabs) ★ 2.7k

A13 Methodology

Curate the authoritative reference list of articles, benchmarks, and tools for harness engineering — the practice of shaping the…

cline-memory-bank (nickbaumann98) ★ 581

A13 Methodology

Custom instructions + 6-file hierarchical Markdown memory bank so Cline maintains full project context across sessions, with a…

FPF (First Principles Framework) ★ 372

A13 Methodology

Provides a formal pattern language for making reasoning explicit, traceable, and publishable in mixed human/AI engineering work —…

nexu-io/harness-engineering-guide ★ 134

A13 Methodology

Provide a practical, code-first reference guide to harness engineering — from first principles to production patterns —…

Distribution

Type: methodology-doc
License: MIT
Install: none
Version: no-version

Surfaces

CLI binary: No
CLI subcmds: 0
Local UI: No

Components

Commands: 0
Skills: 0
Subagents: 0
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 0
Templates: 0

Workflow

Phases: 0
Approval gates: 0
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: No
Pattern: none
Max concurrent: 0
Isolation: none
Consensus: none
Prompt chaining: No

Multi-model

Multi-model: No
BYOK: No
Modal: text

Execution

Mode: none
Crash recovery: No
Compaction: No
Session handoff: No
Streaming: No

Memory

Type: none
Persistence: none
Search: none

Quality

TDD: No
TDD mechanism: none
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: No
Audit format: none
Replay: No

Tools

Primary: none
Portability: high

Signals

Stars: 39
Last commit: 2026-02-25
Contributors: 2
Maintainer: dormant
Quality score: 0/10