Skip to content
/

sd0x-dev-flow

sd0x-dev-flow · sd0xdev/sd0x-dev-flow · ★ 157 · last commit 2026-05-14

Reference implementation of harness engineering for Claude Code — hook-enforced dual review, sentinel-driven state machine, and fail-closed safety where the AI cannot skip quality gates.

Best whenQuality gates that AI can't skip require hooks + sentinels + state machines, not just Iron Law prompts — behavioral enforcement must be mechanical, not rheto…
Skip ifSingle reviewer (dual review is required), Stopping without completing the review gate
vs seeds
superpowers(skills-only, Claude Code primary, proactive activation) but sd0x-dev-flow uses 8 hooks across 5 event types to mechanic…
Primitive shape 215 total
Commands 96 Skills 96 Subagents 15 Hooks 8
00

Summary

sd0x-dev-flow — Summary

sd0x-dev-flow is a Claude Code plugin that implements "harness engineering" for AI coding agents — it wraps the LLM with a formal state-machine loop, 8 lifecycle hooks, 96 skills, 15 agents, and 14 rules to enforce quality gates that the AI cannot skip. The core innovation is a sentinel-driven auto-loop: after any code edit, the review command fires in the same reply (hook-enforced), dispatching dual-review (Codex MCP primary + secondary agent in parallel), deduplicating findings, and emitting a ✅ Ready / ⛔ Blocked gate marker that hooks parse to prevent premature Stop events. Context-compaction survival is handled by a SessionStart(compact) hook that re-injects [AUTO_LOOP_RESUME] to restart the review loop. Skills cover 96 workflows from planning (/codex-brainstorm, /tech-spec) through gate-enforced review (/codex-review-fast) to shipping (/push-ci, /create-pr). 15 persona-based agents (architect, simplifier, reviewer, etc.) are spawned as subagents. Version 3.0.12, MIT license.

Differs from seeds: Most similar to superpowers (skills-only, Claude Code primary, behavioral enforcement) but sd0x-dev-flow is architecturally more aggressive — it uses 8 hooks across 5 event types to enforce behavior that superpowers achieves through Iron Law prompts alone. The dual-reviewer pattern (Codex + secondary in parallel) is not present in any seed. The sentinel-driven state machine (✅ Ready/⛔ Blocked stdout markers parsed by hooks) is a novel harness engineering primitive not present in any seed.

01

Overview

sd0x-dev-flow — Origin, Philosophy, and Manifesto

Origin

Built by sd0xdev (single contributor). JavaScript/shell, MIT license. Version 3.0.12. 157 stars, 1 contributor. Available on npm and as a Claude Code plugin (sd0xdev/sd0x-dev-flow). Multilingual README (EN/zh-TW/zh-CN/ja/ko/es).

Philosophy: Harness Engineering

The README opens with a definition from Martin Fowler and Mitchell Hashimoto (Feb 2026):

"Harness engineering is the discipline of engineering everything around the LLM — tool loops, context management, hooks, state machines, safety layers — as opposed to training the model itself."

sd0x-dev-flow positions itself as a reference implementation of harness engineering for Claude Code, mapping 10 canonical harness sub-problems to concrete code:

Sub-problem Implementation
Tool loop control Auto-loop with sentinel-driven transitions
Sentinel state machine ✅ Ready/⛔ Blocked/✅ All Pass gate markers
Context recovery [AUTO_LOOP_RESUME] stdout injection after compact
Lifecycle interceptors 5 hook event types → 8 scripts
Capability-based tool gating allowed-tools frontmatter in 86/95 skills
Defense-in-depth safety 5 layers: pre-edit-guard → commit-msg-guard → pre-push-gate → stop-guard → sidecar
Generator-evaluator split Dual review: Codex (primary) + Claude (secondary) parallel dispatch
Incremental progress tracking iteration_history.current_round + max_rounds + convergence plateau detection
Human-in-loop safety gates /dev/tty confirmation for destructive ops
Self-improvement loop Correction → record lesson → promote to rule after 3+ recurrences

Key Manifesto Quotes

From CLAUDE.md:

"After editing code or docs, you MUST run the review command in the same reply — do not stop, do not ask, do not just summarize."

"Declaring ≠ Executing: Saying 'should run review' without invoking the Skill tool is a violation."

"Summary ≠ Completion: Outputting a table then stopping is a violation."

Self-improvement Philosophy

Every correction becomes a lesson. After 3+ recurrences of the same issue, the lesson is promoted to a project rule. This creates an evolving ruleset grounded in actual agent failures.

02

Architecture

sd0x-dev-flow — Architecture, Distribution, and Installation

Distribution

  • Claude Code plugin: /plugin marketplace add sd0xdev/sd0x-dev-flow + /plugin install sd0x-dev-flow@sd0xdev-marketplace
  • npm/npx: npx skills add sd0xdev/sd0x-dev-flow (skills-only subset for Codex/Cursor/Windsurf/Aider)

Version analyzed: 3.0.12

Installation

# Full (Claude Code with hooks, rules, auto-loop)
/plugin marketplace add sd0xdev/sd0x-dev-flow
/plugin install sd0x-dev-flow@sd0xdev-marketplace

# Skills only (Codex CLI, Cursor, Windsurf, Aider)
npx skills add sd0xdev/sd0x-dev-flow

# Project setup (auto-detect framework, install rules/hooks subset)
/project-setup

Directory Tree

sd0x-dev-flow/
├── .claude-plugin/
│   ├── plugin.json             # plugin manifest (v3.0.12)
│   └── marketplace.json
├── .claude/                    # project-level hooks/settings
├── skills/                     # 96 skill directories, each with SKILL.md
│   ├── codex-review-fast/
│   ├── verify/
│   ├── feature-dev/
│   └── ... (96 total)
├── agents/                     # 15 agent persona markdown files
│   ├── architecture-designer.md
│   ├── code-simplifier.md
│   ├── codex-architect.md
│   └── ... (15 total)
├── hooks/
│   ├── hooks.json              # hook event routing
│   ├── post-compact-auto-loop.sh
│   ├── post-edit-format.sh
│   ├── post-skill-auto-loop.sh
│   ├── post-tool-review-state.sh
│   ├── pre-edit-guard.sh
│   ├── session-init.sh
│   ├── stop-check.md
│   └── stop-guard.sh
│   └── user-prompt-review-guard.sh
├── scripts/                    # 18 utility scripts
│   ├── emit-review-gate.sh
│   ├── pre-push-gate.sh
│   ├── commit-msg-guard.sh
│   ├── verify-runner.js
│   ├── precommit-runner.js
│   └── ...
├── rules/                      # 14 rule markdown files
│   ├── auto-loop.md
│   ├── self-improvement.md
│   ├── codex-invocation.md
│   └── ...
├── CLAUDE.md                   # harness behavioral requirements
├── CLAUDE.template.md
└── package.json

Required Runtime

  • Claude Code 2.1+
  • Codex MCP (optional — /codex-* skills require it; falls back to single-reviewer mode)

Target AI Tools

  • Claude Code (primary — full plugin with hooks, rules, auto-loop)
  • Codex CLI, Cursor, Windsurf, Aider (skills-only via npx skills add)
03

Components

sd0x-dev-flow — Components

Skills (96)

Categories and examples:

Planning: /codex-brainstorm, /feasibility-study, /tech-spec, /review-spec, /architecture, /deep-analyze, /deep-research, /fp-brief, /project-brief

Development: /feature-dev, /bug-fix, /codex-implement, /codex-architect, /codex-explain, /refactor, /simplify, /de-ai-flavor

Review/Gate: /codex-review-fast (diff-only, parallel Codex + secondary), /codex-review (full), /codex-code-review, /codex-test-review, /codex-review-branch, /codex-review-doc, /codex-security, /precommit, /precommit-fast, /verify (lint→typecheck→unit→integration→e2e)

Shipping: /smart-commit, /push-ci, /create-pr, /pr-review, /load-pr-review

Operations: /project-setup, /repo-intake, /project-audit, /next-step, /best-practices, /risk-assess, /dep-audit

Analysis: /issue-analyze, /git-investigate, /code-explore, /code-investigate, /deep-explore

Agents (15 persona files)

  • architecture-designer.md — system architecture design
  • brief-writer.md — executive summaries
  • code-simplifier.md — cleanup refactoring (model: opus)
  • codex-architect.md — Codex-assisted architecture
  • codex-implementer.md — Codex-assisted implementation
  • coverage-analyst.md — test coverage analysis
  • doc-refactor.md — documentation refactoring
  • feasibility-analyst.md — feasibility analysis
  • git-investigator.md — git history investigation
  • performance-optimizer.md — performance optimization
  • refactor-reviewer.md — refactoring review
  • solution-architect.md — solution design
  • strict-reviewer.md — strict code review
  • tech-spec-reviewer.md — tech spec review
  • verify-app.md — application verification

Hooks (8 scripts, 5 event types)

Event Matcher Script Purpose
SessionStart startup session-init.sh Session initialization
SessionStart startup|compact namespace-hint.sh Plugin namespace hint
SessionStart compact post-compact-auto-loop.sh Resume auto-loop after compact
PreToolUse Edit|Write pre-edit-guard.sh Guard before file edits
PostToolUse Edit|Write post-edit-format.sh Format after edits
PostToolUse Bash|mcp__codex__*|Skill post-tool-review-state.sh Parse sentinel gate markers
PostToolUse Skill post-skill-auto-loop.sh Trigger auto-loop after skill
Stop (any) stop-guard.sh Block stop if gates incomplete
UserPromptSubmit (any) user-prompt-review-guard.sh Guard on new user prompts

Scripts (18)

emit-review-gate.sh, pre-push-gate.sh, commit-msg-guard.sh, verify-runner.js, precommit-runner.js, detect-scope.js, resolve-feature.sh, security-redact.js, namespace-hint.sh, dep-audit.sh, migration-audit.sh, build-codex-artifacts.js, classify-docs-cli.js, generate-readme-catalog.js, resolve-feature-cli.js, run-skill.sh, skills/ (nested), lib/

Rules (14)

auto-loop.md, self-improvement.md, codex-invocation.md, auto-loop.md, and 10+ others covering coding standards, commit conventions, test requirements, etc.

05

Prompts

sd0x-dev-flow — Prompt Files and Techniques

Prompt 1: CLAUDE.md — Auto-Loop Iron Law (Mandatory execution pattern)

### Auto-Loop Rule ⚠️

After editing code or docs, you **MUST** run the review command **in the same reply** — do not stop, do not ask, do not just summarize.

| After editing... | Immediately run | Then on pass |
|------------------|----------------|--------------|
| code files | `/codex-review-fast` | `/precommit` |
| `.md` docs | `/codex-review-doc` | (done) |
| Review found issues | Fix all → re-run same review | — |

**Declaring ≠ Executing**: Saying "should run review" without invoking the Skill tool is a violation.
**Summary ≠ Completion**: Outputting a table then stopping is a violation.

Full spec: @rules/auto-loop.md

Technique: Iron Law with explicit violation taxonomy. Two named violation types ("Declaring ≠ Executing", "Summary ≠ Completion") address the specific failure modes where agents acknowledge requirements verbally but don't act on them. The ⚠️ emoji serves as visual attention anchor. Decision table maps edit type to required command.

Prompt 2: verify SKILL.md — Ecosystem-Adaptive Verification (Context detection pattern)

**Ecosystem detection** (check project root for manifest files):

| Manifest | Ecosystem | Lint | Typecheck | Test |
|----------|-----------|------|-----------|------|
| `package.json` | Node.js | `{pm} lint` | `{pm} typecheck` | `{pm} test:unit` |
| `pyproject.toml` | Python | `ruff check .` | `mypy .` | `pytest` |
| `Cargo.toml` | Rust | `cargo clippy` | _(implicit)_ | `cargo test` |
| `go.mod` | Go | `golangci-lint run` | `go vet ./...` | `go test ./...` |
| `build.gradle` | Java | `./gradlew spotlessCheck` | _(implicit)_ | `./gradlew test` |

Technique: Detection-then-dispatch pattern. The skill inspects the repo for manifest files before running any commands, then adapts the verification stack to the detected ecosystem. This avoids hardcoding tool names and enables cross-language projects to use the same /verify skill.

Prompt 3: codex-review-fast SKILL.md — Variant Delegation (Thin router pattern)

# Codex Review Fast

Thin entry-point skill — routes to the parent skill for full workflow.

## Parent Skill

This is the **Fast** variant of `codex-code-review`. Full workflow, prompt templates, and review logic are defined in the parent skill.

See `@skills/codex-code-review/SKILL.md`

## Variant

| Property | Value |
|----------|-------|
| Scope | Diff only |
| Pre-checks | None (no lint/build) |
| Prompt template | `@skills/codex-code-review/references/codex-prompt-fast.md` |

Technique: Thin variant pattern. The fast skill delegates entirely to its parent, only specifying the variant parameters (scope: diff-only, no pre-checks). This reduces duplication while maintaining a separate skill entry-point with its own trigger description for the agent harness.

09

Uniqueness

sd0x-dev-flow — Uniqueness and Positioning

Differs from Seeds

Most similar to superpowers (skills-only behavioral framework, Claude Code primary, proactive activation via skill descriptions), but sd0x-dev-flow is architecturally more aggressive in 3 ways: (1) 8 hooks across 5 event types enforce behavior that superpowers achieves through Iron Law prompts alone; (2) the dual-reviewer parallel dispatch (Codex MCP + secondary) is not present in any seed; (3) the sentinel-driven state machine (✅ Ready/⛔ Blocked stdout markers parsed by hooks) is a novel harness engineering primitive. Compared to spec-kit (Archetype 2: 18 hooks, mirror commands), sd0x-dev-flow has more skills (96 vs 9) and a more complex hook topology. Compared to claude-flow (MCP toolserver, hive-mind), sd0x-dev-flow uses hooks not an MCP server for orchestration. The explicit "harness engineering" framing (citing Martin Fowler, Mitchell Hashimoto, arXiv papers) positions this as a research artifact, not just a workflow tool.

Positioning

sd0x-dev-flow is the most comprehensively engineered harness implementation in this batch. It covers all 10 canonical harness sub-problems identified in the README, making it a useful reference implementation for teams building custom harnesses. The self-declared positioning: "Most harness projects cover 2–4 of these. sd0x-dev-flow covers all 10 — which makes the code useful as a study target, not just a tool."

The cross-tool fallback (Codex/Cursor/Windsurf via npx skills add) for skills-only subset expands reach beyond Claude Code while keeping the full harness features for Claude Code users.

Observable Failure Modes

  1. Single contributor: All 157 stars and the entire implementation from one person. High bus-factor risk.
  2. Codex MCP dependency: Dual-review falls back to single-reviewer without Codex MCP. Some skills (/codex-*) are non-functional without it.
  3. Strict mode lock: STOP_GUARD_MODE=strict + incomplete review state can block the agent indefinitely. HOOK_BYPASS=1 is the emergency escape.
  4. Double-fire risk: Plugin hooks and local hooks can both fire for the same event. The deference logic (checks for local hook in settings.json before executing) reduces this but adds complexity.
  5. 96-skill cognitive load: The large skill surface (~4% of Claude's context window per README) means each session has significant context overhead.

Explicit Antipatterns

  • Single reviewer (superpowers-style) for code review — dual review is required
  • Stopping without completing the review gate (violation of Auto-Loop Rule)
  • Declaring intent without executing ("Declaring ≠ Executing")
  • Outputting a summary table without running the next command ("Summary ≠ Completion")
04

Workflow

sd0x-dev-flow — Workflow

Auto-Loop (Core Mechanism)

Phase What happens Artifact
1. Code edit Agent edits file via Edit/Write changed files
2. PostToolUse hook fires post-edit-format.sh formats; post-skill-auto-loop.sh triggers
3. Review dispatch /codex-review-fast invoked in same reply
4. Dual review Codex MCP (primary, sandbox, full diff) + secondary agent (parallel) two review outputs
5. Deduplication Findings merged: file+issue key, ±5 line tolerance; source-attributed (codex/toolkit/both) deduplicated findings
6. Gate emission emit-review-gate.sh writes ✅ Ready or ⛔ Blocked to stdout sentinel marker
7. Hook parses gate post-tool-review-state.sh reads stdout sentinel gate state
8. Stop guard stop-guard.sh blocks Stop event if gate is ⛔ Blocked blocked or allowed
9. If blocked: fix Agent addresses findings, re-runs review
10. If ready: /precommit Auto-runs precommit gate commit or block

Context Compaction Recovery

When Claude Code compacts context:

  • SessionStart(compact) event fires
  • post-compact-auto-loop.sh injects [AUTO_LOOP_RESUME] to stdout
  • Agent sees the resume signal and continues the review loop

Workflow Tracks

Track Entry → Gate → Exit
Feature /feature-dev/verify/codex-review-fast/precommit
Bug Fix /issue-analyze/bug-fix/verify/codex-review-fast/precommit
Planning /codex-brainstorm/feasibility-study/tech-spec
Docs .md edit → /codex-review-doc
Shipping /precommit/push-ci/create-pr

Approval Gates

  1. ✅ Ready / ⛔ Blocked: sentinel-driven gate on each review cycle
  2. Stop guard: hooks block the Stop event in strict mode if gate incomplete
  3. Pre-push gate: /dev/tty confirmation for destructive operations
  4. Commit-msg guard: validates conventional commit format before allowing commit

Phase-to-Artifact Map

Phase Artifact
/tech-spec 1-requirements.md, 3-architecture.md
/codex-review-fast deduplicated findings list + gate marker
/precommit passed or blocked commit attempt
/push-ci CI run result
/create-pr PR URL
06

Memory Context

sd0x-dev-flow — Memory and Context

Sentinel State Machine

The primary state mechanism is stdout-based: scripts write ✅ Ready, ⛔ Blocked, or ✅ All Pass sentinel markers to stdout, which hooks read to determine gate state. This is in-session ephemeral state — not persisted to disk.

Context Compaction Handling

Explicitly addressed: post-compact-auto-loop.sh fires on SessionStart(compact) event and injects [AUTO_LOOP_RESUME] to restart the review loop. This is the key mechanism for surviving Claude Code's context compaction without losing the review gate state.

Iteration History

iteration_history.current_round and max_rounds are tracked within the session to detect convergence plateaus and trigger strategic reset if the loop runs too many rounds without improvement.

CLAUDE.md as Persistent Context

The CLAUDE.md file and referenced @rules/auto-loop.md are loaded at session start and serve as the persistent behavioral specification. They are the "memory" that survives context compaction because they are re-read on every session initialization.

Self-improvement Loop

Corrections are recorded as lessons. After 3+ recurrences, a lesson is promoted to a project rule file in rules/. This is the only cross-session learning mechanism — rules persist in the filesystem.

State Files

  • .claude/settings.json — hook configuration
  • rules/ directory — accumulated project rules
  • hooks/hooks.json — hook routing configuration
  • Implicit: git branch state tracks the current work
07

Orchestration

sd0x-dev-flow — Orchestration

Multi-agent

Yes. The dual-review pattern dispatches Codex MCP (primary) and a secondary agent (pr-review-toolkit / strict-reviewer) in parallel on every review cycle.

The 15 agent persona files are spawned as subagents via Claude Code's Task tool for specialized tasks (architecture design, code simplification, feasibility analysis, etc.).

Orchestration Pattern

Hierarchical: primary Claude Code instance coordinates, dispatches subagents for review and specialized tasks.

Dual-Review Sequence

Agent edits → emit-review-gate PENDING → 
  Codex review (sandbox, full diff) [parallel]
  + Task(code-reviewer) secondary [parallel]
→ Aggregate + dedup + gate →
  emit-review-gate READY/BLOCKED

Findings are severity-normalized (P0–Nit), deduplicated (file + issue key, ±5 line tolerance), and source-attributed (codex | toolkit | both).

Fallback Modes

Codex MCP Secondary Mode
Available Available Full dual-review
Unavailable Available Single-reviewer (secondary)
Available Unavailable Strict-reviewer fallback
Neither Single-reviewer minimum

Execution Mode

Continuous (ralph-style loop): each edit triggers the review loop automatically. The agent does not "stop" between iterations unless the gate clears.

Isolation Mechanism

None in the standard configuration. The Codex MCP runs in a sandbox environment (Codex's own execution sandbox, not a crit/worktree isolation).

Context Recovery

[AUTO_LOOP_RESUME] sentinel injected by post-compact-auto-loop.sh on context compaction — the loop survives compaction events without losing gate state.

Multi-model

Yes. Agent frontmatter specifies model preferences (e.g., code-simplifier.md specifies model: opus). Codex MCP uses its own model; secondary reviewer uses Claude Code's configured model.

08

Ui Cli Surface

sd0x-dev-flow — UI and CLI Surface

Dedicated CLI Binary

No standalone binary. Skills are invoked as Claude Code slash commands or via npx skills add.

Local UI

None. The tool operates entirely within Claude Code's terminal interface.

IDE Integration

Claude Code plugin (primary). Skills-only via npx skills add for Cursor, Windsurf, Codex, Aider.

Observability

  • Sentinel stdout markers (✅ Ready, ⛔ Blocked) visible in terminal output
  • stop-guard.sh logs blocking events
  • post-tool-review-state.sh tracks gate state transitions
  • iteration_history.current_round counter visible in skill output
  • Hook debug output via HOOK_DEBUG=1 environment variable

Modes

Mode Behavior
Default (warn) Log missing steps but allow Stop
Strict (block) Block Stop until all gates complete; set STOP_GUARD_MODE=strict

Bypass Mechanisms

  • HOOK_BYPASS=1 — emergency escape hatch (skip all checks)
  • Plugin-defers-to-local: when running as plugin, checks for identical local hook and defers to it to avoid double-firing

npm Distribution

package.json in repo root enables npx skills add sd0xdev/sd0x-dev-flow to install skills-only subset for non-Claude-Code agents.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.