sd0x-dev-flow

sd0x-dev-flow · sd0xdev/sd0x-dev-flow · ★ 157 · last commit 2026-05-14

Reference implementation of harness engineering for Claude Code — hook-enforced dual review, sentinel-driven state machine, and fail-closed safety where the AI cannot skip quality gates.

Best whenQuality gates that AI can't skip require hooks + sentinels + state machines, not just Iron Law prompts — behavioral enforcement must be mechanical, not rheto…

Skip ifSingle reviewer (dual review is required), Stopping without completing the review gate

vs seeds

superpowers(skills-only, Claude Code primary, proactive activation) but sd0x-dev-flow uses 8 hooks across 5 event types to mechanic…

Primitive shape 215 total

Commands 96 Skills 96 Subagents 15 Hooks 8

Summary

sd0x-dev-flow — Summary

sd0x-dev-flow is a Claude Code plugin that implements "harness engineering" for AI coding agents — it wraps the LLM with a formal state-machine loop, 8 lifecycle hooks, 96 skills, 15 agents, and 14 rules to enforce quality gates that the AI cannot skip. The core innovation is a sentinel-driven auto-loop: after any code edit, the review command fires in the same reply (hook-enforced), dispatching dual-review (Codex MCP primary + secondary agent in parallel), deduplicating findings, and emitting a ✅ Ready / ⛔ Blocked gate marker that hooks parse to prevent premature Stop events. Context-compaction survival is handled by a SessionStart(compact) hook that re-injects [AUTO_LOOP_RESUME] to restart the review loop. Skills cover 96 workflows from planning (/codex-brainstorm, /tech-spec) through gate-enforced review (/codex-review-fast) to shipping (/push-ci, /create-pr). 15 persona-based agents (architect, simplifier, reviewer, etc.) are spawned as subagents. Version 3.0.12, MIT license.

Differs from seeds: Most similar to superpowers (skills-only, Claude Code primary, behavioral enforcement) but sd0x-dev-flow is architecturally more aggressive — it uses 8 hooks across 5 event types to enforce behavior that superpowers achieves through Iron Law prompts alone. The dual-reviewer pattern (Codex + secondary in parallel) is not present in any seed. The sentinel-driven state machine (✅ Ready/⛔ Blocked stdout markers parsed by hooks) is a novel harness engineering primitive not present in any seed.

Overview

sd0x-dev-flow — Origin, Philosophy, and Manifesto

Origin

Built by sd0xdev (single contributor). JavaScript/shell, MIT license. Version 3.0.12. 157 stars, 1 contributor. Available on npm and as a Claude Code plugin (sd0xdev/sd0x-dev-flow). Multilingual README (EN/zh-TW/zh-CN/ja/ko/es).

Philosophy: Harness Engineering

The README opens with a definition from Martin Fowler and Mitchell Hashimoto (Feb 2026):

"Harness engineering is the discipline of engineering everything around the LLM — tool loops, context management, hooks, state machines, safety layers — as opposed to training the model itself."

sd0x-dev-flow positions itself as a reference implementation of harness engineering for Claude Code, mapping 10 canonical harness sub-problems to concrete code:

Sub-problem	Implementation
Tool loop control	Auto-loop with sentinel-driven transitions
Sentinel state machine	`✅ Ready`/`⛔ Blocked`/`✅ All Pass` gate markers
Context recovery	`[AUTO_LOOP_RESUME]` stdout injection after compact
Lifecycle interceptors	5 hook event types → 8 scripts
Capability-based tool gating	`allowed-tools` frontmatter in 86/95 skills
Defense-in-depth safety	5 layers: pre-edit-guard → commit-msg-guard → pre-push-gate → stop-guard → sidecar
Generator-evaluator split	Dual review: Codex (primary) + Claude (secondary) parallel dispatch
Incremental progress tracking	`iteration_history.current_round` + `max_rounds` + convergence plateau detection
Human-in-loop safety gates	`/dev/tty` confirmation for destructive ops
Self-improvement loop	Correction → record lesson → promote to rule after 3+ recurrences

Key Manifesto Quotes

From CLAUDE.md:

"After editing code or docs, you MUST run the review command in the same reply — do not stop, do not ask, do not just summarize."

"Declaring ≠ Executing: Saying 'should run review' without invoking the Skill tool is a violation."

"Summary ≠ Completion: Outputting a table then stopping is a violation."

Self-improvement Philosophy

Every correction becomes a lesson. After 3+ recurrences of the same issue, the lesson is promoted to a project rule. This creates an evolving ruleset grounded in actual agent failures.

Architecture

sd0x-dev-flow — Architecture, Distribution, and Installation

Distribution

Claude Code plugin: /plugin marketplace add sd0xdev/sd0x-dev-flow + /plugin install sd0x-dev-flow@sd0xdev-marketplace
npm/npx: npx skills add sd0xdev/sd0x-dev-flow (skills-only subset for Codex/Cursor/Windsurf/Aider)

Version analyzed: 3.0.12

Installation

# Full (Claude Code with hooks, rules, auto-loop)
/plugin marketplace add sd0xdev/sd0x-dev-flow
/plugin install sd0x-dev-flow@sd0xdev-marketplace

# Skills only (Codex CLI, Cursor, Windsurf, Aider)
npx skills add sd0xdev/sd0x-dev-flow

# Project setup (auto-detect framework, install rules/hooks subset)
/project-setup

Directory Tree

sd0x-dev-flow/
├── .claude-plugin/
│   ├── plugin.json             # plugin manifest (v3.0.12)
│   └── marketplace.json
├── .claude/                    # project-level hooks/settings
├── skills/                     # 96 skill directories, each with SKILL.md
│   ├── codex-review-fast/
│   ├── verify/
│   ├── feature-dev/
│   └── ... (96 total)
├── agents/                     # 15 agent persona markdown files
│   ├── architecture-designer.md
│   ├── code-simplifier.md
│   ├── codex-architect.md
│   └── ... (15 total)
├── hooks/
│   ├── hooks.json              # hook event routing
│   ├── post-compact-auto-loop.sh
│   ├── post-edit-format.sh
│   ├── post-skill-auto-loop.sh
│   ├── post-tool-review-state.sh
│   ├── pre-edit-guard.sh
│   ├── session-init.sh
│   ├── stop-check.md
│   └── stop-guard.sh
│   └── user-prompt-review-guard.sh
├── scripts/                    # 18 utility scripts
│   ├── emit-review-gate.sh
│   ├── pre-push-gate.sh
│   ├── commit-msg-guard.sh
│   ├── verify-runner.js
│   ├── precommit-runner.js
│   └── ...
├── rules/                      # 14 rule markdown files
│   ├── auto-loop.md
│   ├── self-improvement.md
│   ├── codex-invocation.md
│   └── ...
├── CLAUDE.md                   # harness behavioral requirements
├── CLAUDE.template.md
└── package.json

Required Runtime

Claude Code 2.1+
Codex MCP (optional — /codex-* skills require it; falls back to single-reviewer mode)

Target AI Tools

Claude Code (primary — full plugin with hooks, rules, auto-loop)
Codex CLI, Cursor, Windsurf, Aider (skills-only via npx skills add)

Components

sd0x-dev-flow — Components

Skills (96)

Categories and examples:

Planning: /codex-brainstorm, /feasibility-study, /tech-spec, /review-spec, /architecture, /deep-analyze, /deep-research, /fp-brief, /project-brief

Development: /feature-dev, /bug-fix, /codex-implement, /codex-architect, /codex-explain, /refactor, /simplify, /de-ai-flavor

Review/Gate: /codex-review-fast (diff-only, parallel Codex + secondary), /codex-review (full), /codex-code-review, /codex-test-review, /codex-review-branch, /codex-review-doc, /codex-security, /precommit, /precommit-fast, /verify (lint→typecheck→unit→integration→e2e)

Shipping: /smart-commit, /push-ci, /create-pr, /pr-review, /load-pr-review

Operations: /project-setup, /repo-intake, /project-audit, /next-step, /best-practices, /risk-assess, /dep-audit

Analysis: /issue-analyze, /git-investigate, /code-explore, /code-investigate, /deep-explore

Agents (15 persona files)

architecture-designer.md — system architecture design
brief-writer.md — executive summaries
code-simplifier.md — cleanup refactoring (model: opus)
codex-architect.md — Codex-assisted architecture
codex-implementer.md — Codex-assisted implementation
coverage-analyst.md — test coverage analysis
doc-refactor.md — documentation refactoring
feasibility-analyst.md — feasibility analysis
git-investigator.md — git history investigation
performance-optimizer.md — performance optimization
refactor-reviewer.md — refactoring review
solution-architect.md — solution design
strict-reviewer.md — strict code review
tech-spec-reviewer.md — tech spec review
verify-app.md — application verification

Hooks (8 scripts, 5 event types)

Event	Matcher	Script	Purpose
SessionStart	startup	`session-init.sh`	Session initialization
SessionStart	startup\|compact	`namespace-hint.sh`	Plugin namespace hint
SessionStart	compact	`post-compact-auto-loop.sh`	Resume auto-loop after compact
PreToolUse	Edit\|Write	`pre-edit-guard.sh`	Guard before file edits
PostToolUse	Edit\|Write	`post-edit-format.sh`	Format after edits
PostToolUse	Bash\|mcp__codex__*\|Skill	`post-tool-review-state.sh`	Parse sentinel gate markers
PostToolUse	Skill	`post-skill-auto-loop.sh`	Trigger auto-loop after skill
Stop	(any)	`stop-guard.sh`	Block stop if gates incomplete
UserPromptSubmit	(any)	`user-prompt-review-guard.sh`	Guard on new user prompts

Scripts (18)

emit-review-gate.sh, pre-push-gate.sh, commit-msg-guard.sh, verify-runner.js, precommit-runner.js, detect-scope.js, resolve-feature.sh, security-redact.js, namespace-hint.sh, dep-audit.sh, migration-audit.sh, build-codex-artifacts.js, classify-docs-cli.js, generate-readme-catalog.js, resolve-feature-cli.js, run-skill.sh, skills/ (nested), lib/

Rules (14)

auto-loop.md, self-improvement.md, codex-invocation.md, auto-loop.md, and 10+ others covering coding standards, commit conventions, test requirements, etc.

Prompts

sd0x-dev-flow — Prompt Files and Techniques

Prompt 1: CLAUDE.md — Auto-Loop Iron Law (Mandatory execution pattern)

### Auto-Loop Rule ⚠️

After editing code or docs, you **MUST** run the review command **in the same reply** — do not stop, do not ask, do not just summarize.

| After editing... | Immediately run | Then on pass |
|------------------|----------------|--------------|
| code files | `/codex-review-fast` | `/precommit` |
| `.md` docs | `/codex-review-doc` | (done) |
| Review found issues | Fix all → re-run same review | — |

**Declaring ≠ Executing**: Saying "should run review" without invoking the Skill tool is a violation.
**Summary ≠ Completion**: Outputting a table then stopping is a violation.

Full spec: @rules/auto-loop.md

Technique: Iron Law with explicit violation taxonomy. Two named violation types ("Declaring ≠ Executing", "Summary ≠ Completion") address the specific failure modes where agents acknowledge requirements verbally but don't act on them. The ⚠️ emoji serves as visual attention anchor. Decision table maps edit type to required command.

Prompt 2: verify SKILL.md — Ecosystem-Adaptive Verification (Context detection pattern)

**Ecosystem detection** (check project root for manifest files):

| Manifest | Ecosystem | Lint | Typecheck | Test |
|----------|-----------|------|-----------|------|
| `package.json` | Node.js | `{pm} lint` | `{pm} typecheck` | `{pm} test:unit` |
| `pyproject.toml` | Python | `ruff check .` | `mypy .` | `pytest` |
| `Cargo.toml` | Rust | `cargo clippy` | _(implicit)_ | `cargo test` |
| `go.mod` | Go | `golangci-lint run` | `go vet ./...` | `go test ./...` |
| `build.gradle` | Java | `./gradlew spotlessCheck` | _(implicit)_ | `./gradlew test` |

Technique: Detection-then-dispatch pattern. The skill inspects the repo for manifest files before running any commands, then adapts the verification stack to the detected ecosystem. This avoids hardcoding tool names and enables cross-language projects to use the same /verify skill.

Prompt 3: codex-review-fast SKILL.md — Variant Delegation (Thin router pattern)

# Codex Review Fast

Thin entry-point skill — routes to the parent skill for full workflow.

## Parent Skill

This is the **Fast** variant of `codex-code-review`. Full workflow, prompt templates, and review logic are defined in the parent skill.

See `@skills/codex-code-review/SKILL.md`

## Variant

| Property | Value |
|----------|-------|
| Scope | Diff only |
| Pre-checks | None (no lint/build) |
| Prompt template | `@skills/codex-code-review/references/codex-prompt-fast.md` |

Technique: Thin variant pattern. The fast skill delegates entirely to its parent, only specifying the variant parameters (scope: diff-only, no pre-checks). This reduces duplication while maintaining a separate skill entry-point with its own trigger description for the agent harness.

Uniqueness

sd0x-dev-flow — Uniqueness and Positioning

Differs from Seeds

Most similar to superpowers (skills-only behavioral framework, Claude Code primary, proactive activation via skill descriptions), but sd0x-dev-flow is architecturally more aggressive in 3 ways: (1) 8 hooks across 5 event types enforce behavior that superpowers achieves through Iron Law prompts alone; (2) the dual-reviewer parallel dispatch (Codex MCP + secondary) is not present in any seed; (3) the sentinel-driven state machine (✅ Ready/⛔ Blocked stdout markers parsed by hooks) is a novel harness engineering primitive. Compared to spec-kit (Archetype 2: 18 hooks, mirror commands), sd0x-dev-flow has more skills (96 vs 9) and a more complex hook topology. Compared to claude-flow (MCP toolserver, hive-mind), sd0x-dev-flow uses hooks not an MCP server for orchestration. The explicit "harness engineering" framing (citing Martin Fowler, Mitchell Hashimoto, arXiv papers) positions this as a research artifact, not just a workflow tool.

Positioning

sd0x-dev-flow is the most comprehensively engineered harness implementation in this batch. It covers all 10 canonical harness sub-problems identified in the README, making it a useful reference implementation for teams building custom harnesses. The self-declared positioning: "Most harness projects cover 2–4 of these. sd0x-dev-flow covers all 10 — which makes the code useful as a study target, not just a tool."

The cross-tool fallback (Codex/Cursor/Windsurf via npx skills add) for skills-only subset expands reach beyond Claude Code while keeping the full harness features for Claude Code users.

Observable Failure Modes

Single contributor: All 157 stars and the entire implementation from one person. High bus-factor risk.
Codex MCP dependency: Dual-review falls back to single-reviewer without Codex MCP. Some skills (/codex-*) are non-functional without it.
Strict mode lock: STOP_GUARD_MODE=strict + incomplete review state can block the agent indefinitely. HOOK_BYPASS=1 is the emergency escape.
Double-fire risk: Plugin hooks and local hooks can both fire for the same event. The deference logic (checks for local hook in settings.json before executing) reduces this but adds complexity.
96-skill cognitive load: The large skill surface (~4% of Claude's context window per README) means each session has significant context overhead.

Explicit Antipatterns

Single reviewer (superpowers-style) for code review — dual review is required
Stopping without completing the review gate (violation of Auto-Loop Rule)
Declaring intent without executing ("Declaring ≠ Executing")
Outputting a summary table without running the next command ("Summary ≠ Completion")

Workflow

sd0x-dev-flow — Workflow

Auto-Loop (Core Mechanism)

Phase	What happens	Artifact
1. Code edit	Agent edits file via Edit/Write	changed files
2. PostToolUse hook fires	`post-edit-format.sh` formats; `post-skill-auto-loop.sh` triggers	—
3. Review dispatch	`/codex-review-fast` invoked in same reply	—
4. Dual review	Codex MCP (primary, sandbox, full diff) + secondary agent (parallel)	two review outputs
5. Deduplication	Findings merged: file+issue key, ±5 line tolerance; source-attributed (codex/toolkit/both)	deduplicated findings
6. Gate emission	`emit-review-gate.sh` writes `✅ Ready` or `⛔ Blocked` to stdout	sentinel marker
7. Hook parses gate	`post-tool-review-state.sh` reads stdout sentinel	gate state
8. Stop guard	`stop-guard.sh` blocks Stop event if gate is `⛔ Blocked`	blocked or allowed
9. If blocked: fix	Agent addresses findings, re-runs review	—
10. If ready: `/precommit`	Auto-runs precommit gate	commit or block

Context Compaction Recovery

When Claude Code compacts context:

SessionStart(compact) event fires
post-compact-auto-loop.sh injects [AUTO_LOOP_RESUME] to stdout
Agent sees the resume signal and continues the review loop

Workflow Tracks

Track	Entry → Gate → Exit
Feature	`/feature-dev` → `/verify` → `/codex-review-fast` → `/precommit`
Bug Fix	`/issue-analyze` → `/bug-fix` → `/verify` → `/codex-review-fast` → `/precommit`
Planning	`/codex-brainstorm` → `/feasibility-study` → `/tech-spec`
Docs	`.md` edit → `/codex-review-doc`
Shipping	`/precommit` → `/push-ci` → `/create-pr`

Approval Gates

✅ Ready / ⛔ Blocked: sentinel-driven gate on each review cycle
Stop guard: hooks block the Stop event in strict mode if gate incomplete
Pre-push gate: /dev/tty confirmation for destructive operations
Commit-msg guard: validates conventional commit format before allowing commit

Phase-to-Artifact Map

Phase	Artifact
`/tech-spec`	`1-requirements.md`, `3-architecture.md`
`/codex-review-fast`	deduplicated findings list + gate marker
`/precommit`	passed or blocked commit attempt
`/push-ci`	CI run result
`/create-pr`	PR URL

Memory Context

sd0x-dev-flow — Memory and Context

Sentinel State Machine

The primary state mechanism is stdout-based: scripts write ✅ Ready, ⛔ Blocked, or ✅ All Pass sentinel markers to stdout, which hooks read to determine gate state. This is in-session ephemeral state — not persisted to disk.

Context Compaction Handling

Explicitly addressed: post-compact-auto-loop.sh fires on SessionStart(compact) event and injects [AUTO_LOOP_RESUME] to restart the review loop. This is the key mechanism for surviving Claude Code's context compaction without losing the review gate state.

Iteration History

iteration_history.current_round and max_rounds are tracked within the session to detect convergence plateaus and trigger strategic reset if the loop runs too many rounds without improvement.

CLAUDE.md as Persistent Context

The CLAUDE.md file and referenced @rules/auto-loop.md are loaded at session start and serve as the persistent behavioral specification. They are the "memory" that survives context compaction because they are re-read on every session initialization.

Self-improvement Loop

Corrections are recorded as lessons. After 3+ recurrences, a lesson is promoted to a project rule file in rules/. This is the only cross-session learning mechanism — rules persist in the filesystem.

State Files

.claude/settings.json — hook configuration
rules/ directory — accumulated project rules
hooks/hooks.json — hook routing configuration
Implicit: git branch state tracks the current work

Orchestration

sd0x-dev-flow — Orchestration

Multi-agent

Yes. The dual-review pattern dispatches Codex MCP (primary) and a secondary agent (pr-review-toolkit / strict-reviewer) in parallel on every review cycle.

The 15 agent persona files are spawned as subagents via Claude Code's Task tool for specialized tasks (architecture design, code simplification, feasibility analysis, etc.).

Orchestration Pattern

Hierarchical: primary Claude Code instance coordinates, dispatches subagents for review and specialized tasks.

Dual-Review Sequence

Agent edits → emit-review-gate PENDING → 
  Codex review (sandbox, full diff) [parallel]
  + Task(code-reviewer) secondary [parallel]
→ Aggregate + dedup + gate →
  emit-review-gate READY/BLOCKED

Findings are severity-normalized (P0–Nit), deduplicated (file + issue key, ±5 line tolerance), and source-attributed (codex | toolkit | both).

Fallback Modes

Codex MCP	Secondary	Mode
Available	Available	Full dual-review
Unavailable	Available	Single-reviewer (secondary)
Available	Unavailable	Strict-reviewer fallback
Neither	—	Single-reviewer minimum

Execution Mode

Continuous (ralph-style loop): each edit triggers the review loop automatically. The agent does not "stop" between iterations unless the gate clears.

Isolation Mechanism

None in the standard configuration. The Codex MCP runs in a sandbox environment (Codex's own execution sandbox, not a crit/worktree isolation).

Context Recovery

[AUTO_LOOP_RESUME] sentinel injected by post-compact-auto-loop.sh on context compaction — the loop survives compaction events without losing gate state.

Multi-model

Yes. Agent frontmatter specifies model preferences (e.g., code-simplifier.md specifies model: opus). Codex MCP uses its own model; secondary reviewer uses Claude Code's configured model.

Ui Cli Surface

sd0x-dev-flow — UI and CLI Surface

Dedicated CLI Binary

No standalone binary. Skills are invoked as Claude Code slash commands or via npx skills add.

Local UI

None. The tool operates entirely within Claude Code's terminal interface.

IDE Integration

Claude Code plugin (primary). Skills-only via npx skills add for Cursor, Windsurf, Codex, Aider.

Observability

Sentinel stdout markers (✅ Ready, ⛔ Blocked) visible in terminal output
stop-guard.sh logs blocking events
post-tool-review-state.sh tracks gate state transitions
iteration_history.current_round counter visible in skill output
Hook debug output via HOOK_DEBUG=1 environment variable

Modes

Mode	Behavior
Default (warn)	Log missing steps but allow Stop
Strict (block)	Block Stop until all gates complete; set `STOP_GUARD_MODE=strict`

Bypass Mechanisms

HOOK_BYPASS=1 — emergency escape hatch (skip all checks)
Plugin-defers-to-local: when running as plugin, checks for identical local hook and defers to it to avoid double-firing

npm Distribution

package.json in repo root enables npx skills add sd0xdev/sd0x-dev-flow to install skills-only subset for non-Claude-Code agents.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

A8 Cross-runtime harness

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A8 Cross-runtime harness

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

A8 Cross-runtime harness

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

A8 Cross-runtime harness

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

A8 Cross-runtime harness

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

A8 Cross-runtime harness

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.

Distribution

Type: claude-plugin
License: MIT
Install: npm-install
Version: 3.0.12

Surfaces

CLI binary: No
CLI subcmds: 0
Local UI: No
Tech stack: none

Components

Commands: 96
Skills: 96
Subagents: 15
Hooks: 8
MCP servers: 0
MCP tools: 0
Scripts: 18
Templates: 1

Workflow

Phases: 4
Approval gates: 4
Spec format: markdown
Spec storage: per-feature-folder
Delta or full: whole-file

Orchestration

Multi-agent: Yes
Pattern: hierarchical
Max concurrent: 2
Isolation: none
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: Yes
BYOK: No
Modal: text

Execution

Mode: continuous-ralph
Crash recovery: Yes
Compaction: Yes
Session handoff: Yes
Streaming: No

Memory

Type: file-based
Persistence: project
Search: none
State files: 5 files

Quality

TDD: Yes
TDD mechanism: prompt-iron-law
Validators: 5
Self-review: adversarial-subagent

Git / Observability

Auto commit: Yes
Auto PR: Yes
Auto merge: No
Worktree/feat: No
Audit log: No
Audit format: none
Replay: No

Tools

Primary: claude-code
Targets: 5
Portability: medium

Signals

Stars: 157
Last commit: 2026-05-14
Contributors: 1
Maintainer: active
Quality score: 6.4/10