Chachamaru Claude Code Harness

chachamaru-cc-harness · Chachamaru127/claude-code-harness · ★ 1.6k · last commit 2026-05-26

Turns raw Claude Code sessions into a disciplined Plan→Work→Review→Release delivery loop with spec contracts and worktree-isolated agent teams.

Best whenThe Go binary enforcing constraints at the OS level, outside the LLM loop, is the only reliable way to prevent agent drift.

Skip ifPromoting unobserved data into claims (not_observed != absent), Starting work before plan contract is approved

vs seeds

superpowers(skills-as-behavioral-framework, Claude Code plugin) but adds a compiled Go enforcement binary, named worktree-isolated …

Primitive shape 53 total

Commands 7 Skills 34 Subagents 4 Hooks 8

Summary

Chachamaru Claude Code Harness — Summary

Claude Code Harness by Chachamaru is the most architecturally complete single-author harness in this batch: a Go-native binary (bin/harness), a TOML-driven config generator (harness.toml), a Claude Code plugin, Codex CLI and OpenCode compatibility paths, 34+ named skills, 4 named subagents, and a runtime hook system with PreToolUse/PermissionRequest/Setup lifecycle events. The central thesis is "Plan→Work→Review→Release as a disciplined delivery loop" — the 5-verb surface (harness-plan, harness-work, harness-review, harness-sync, harness-release) is an enforced workflow gate, not a collection of optional helpers. The Go-native bin/harness binary supplies permission enforcement, inbox checking, and migration reporting outside the Claude Code runtime, making it the only harness in this batch that ships language-native compiled tooling. Worktree-per-agent isolation is declared in the agent definitions (worker.md declares isolation: worktree). TDD is enforced by skill-level --tdd-bypass flag with mandatory audit trail, placing it in the "persona-instruction" enforcement camp but with an escape-valve mechanism. Differs from seeds: closest to superpowers (skills-only behavioral framework) but adds a CLI binary, TOML config layer, named subagents, and multi-tool compatibility that superpowers lacks; closer to kiro in the sense of a holistic delivery harness, but distributed as a Claude Code plugin rather than a closed IDE.

Overview

Chachamaru Claude Code Harness — Overview

Origin

GitHub: https://github.com/Chachamaru127/claude-code-harness
Version analyzed: v4.12.3 (from harness.toml)
Language: Shell (primary), Go (binary), with Japanese-first documentation.
Stars: 1,611 as of 2026-05-26.

Philosophy

From the README:

"Claude Code is powerful, but raw agent work drifts: plans live in chat, tests become optional, review happens too late, and release evidence gets rebuilt by memory. Harness turns that into one repeatable operating path."

The stated goal is to shift the user's job from "hand-write the plan" to "approve or correct the generated contract before execution continues." Harness occupies the meta-layer: it frames every agent session as a contract lifecycle (spec.md → Plans.md → implementation → verification → release evidence).

Manifesto-Style Quotes

From README:

"After install, the default changes from 'ask the agent to code' to: 1. write the spec and plan, 2. implement only the approved slice, 3. verify the result, 4. review independently, 5. package evidence for PR or release."

From harness-plan SKILL.md:

"harness-plan は、spec.md product contract and Plans.md task contract の co-required planning output を作る planning surface である."

From agent worker.md:

"推測で要件を足さない。未確認事項は 'missing-input' として明示する." (Do not add requirements by inference. Flag unchecked items as 'missing-input'.)

From harness-work SKILL.md:

"明示的なモードフラグは常にオートモードを上書きする" (Explicit mode flags always override auto-mode selection.)

Key Design Decisions

TOML as source of truth: harness.toml is the project config; bin/harness sync regenerates all plugin files from it.
Go binary for guardrails: compiled harness binary handles hook dispatch, permission enforcement, and migration reporting without requiring Node.js.
5-verb discipline: the only valid entry points are plan/work/review/sync/release — no catch-all "chat with agent" path.
not_observed != absent: missing local proof means "not proven here", not "impossible". A documented epistemic stance.

Architecture

Chachamaru Claude Code Harness — Architecture

Distribution

Primary: Claude Code plugin marketplace (/plugin marketplace add Chachamaru127/claude-code-harness)
Codex CLI: scripts/setup-codex.sh --user
OpenCode: scripts/setup-opencode.sh
Cursor / GitHub Copilot CLI: candidate paths (no install evidence yet)

Install Complexity

Multi-step: marketplace add + plugin install + /harness-setup initialization.

Directory Tree

claude-code-harness/
├── .claude-code-harness.config.yaml  # runtime config
├── .claude-plugin/                   # Claude Code plugin manifest
├── .claude/                          # memory/, output-styles/, rules/
├── .codex-plugin/                    # Codex plugin support
├── .cursor/                          # Cursor profile
├── agents/                           # 4 subagent definitions
│   ├── advisor.md
│   ├── reviewer.md
│   ├── scaffolder.md
│   └── worker.md
├── bin/                              # Go binaries
│   ├── harness                       # Universal launcher
│   ├── harness-darwin-amd64
│   ├── harness-darwin-arm64
│   ├── harness-linux-amd64
│   └── harness-windows-amd64.exe
├── codex/                            # Codex-specific files
├── go/                               # Go source
├── hooks/
│   └── hooks.json                    # PreToolUse/PermissionRequest/Setup hooks
├── opencode/                         # OpenCode-specific files
├── output-styles/                    # Output format templates
├── scripts/                          # Setup and migration scripts
├── skills/                           # 34+ skill directories
├── templates/                        # Project templates
├── tests/                            # Test suite
├── workflows/                        # Workflow definitions
├── harness.toml                      # Master config (v4.12.3)
└── spec.md                           # Self-describing spec

Required Runtime

Claude Code v2.1+ (supported path)
Go runtime (bundled binary, no install needed)
No Node.js required

Target AI Tools

Tool	Tier
Claude Code	supported
Codex CLI	internal-compatible
Codex app	candidate
OpenCode	internal-compatible
Cursor	candidate
GitHub Copilot CLI	candidate

Components

Chachamaru Claude Code Harness — Components

Skills (34+)

Name	Purpose
harness-plan	Core planning: spec.md + Plans.md generation with quality scoring
harness-work	Task execution: solo/parallel/breezing/codex modes
harness-review	Independent review phase, major findings block completion
harness-release	Release preflight and evidence packaging
harness-setup	One-time project initialization
harness-sync	Synchronize Plans.md with implementation state
harness-accept	Acceptance criteria verification
harness-loop	Continuous loop management
harness-plan-brief	Lightweight plan variant
harness-progress	Progress tracking
session-control	Session lifecycle management
session-init	Session initialization
session-memory	Cross-session memory management
session-state	State persistence
session	General session handling
memory	Memory read/write primitives
principles	Behavioral principles injection
routing-rules	Tool routing configuration
agent-browser	Browser automation skill
auth	Authentication patterns
breezing	Full-team parallel execution
cc-cursor-cc	Cross-harness bridging
cc-update-review	Update and review cycle
ci	CI failure recovery
crud	CRUD pattern enforcement
deploy	Deployment patterns
generate-slide	Presentation generation
generate-video	Video generation
gogcli-ops	Go CLI operations
maintenance	Maintenance protocols
notebookLM	NotebookLM integration
ui	UI implementation patterns
vibecoder-guide	Vibe coding guidance
workflow-guide	Workflow reference

Agents (4 subagents)

Name	Role
worker.md	Implements tasks in TDD→impl→preflight→verify→commit-prep cycle; `isolation: worktree`
reviewer.md	Independent review of worker output
advisor.md	Strategic advice before implementation
scaffolder.md	Project scaffolding

Hooks

File: hooks/hooks.json

Event	Matcher	Action
PreToolUse	Write\|Edit\|MultiEdit\|Bash\|Read	`bin/harness hook pre-tool` (Go binary)
PreToolUse	AskUserQuestion	`bin/harness hook ask-user-question-normalize`
PreToolUse	Write\|Edit	`bin/harness hook inbox-check` + agent review for secrets/TODO stubs
PreToolUse	mcp__chrome-devtools__.\|mcp__playwright__.	`bin/harness hook browser-guide`
PermissionRequest	Edit\|Write\|MultiEdit	`bin/harness hook permission`
PermissionRequest	Bash (git/npm/test)	`bin/harness hook permission` (auto-allow pattern)
Setup	init	`bin/harness hook setup-init`
Setup	maintenance	maintenance hook

Scripts

Path	Purpose	Trigger
scripts/setup-codex.sh	Codex CLI installation	manual
scripts/setup-opencode.sh	OpenCode installation	manual
bin/harness	Go binary for hooks/permissions/migration	hook/manual

Config Files

harness.toml — master configuration, generates plugin files on harness sync
.claude-code-harness.config.yaml — runtime override config
claude-code-harness.config.example.json — example config
claude-code-harness.config.schema.json — JSON schema for config validation

Templates

Located in templates/ — project templates for common patterns.

Prompts

Chachamaru Claude Code Harness — Prompts

Excerpt 1: harness-plan SKILL.md (quality flow)

Technique: Iron-law prompt with rationalization table + precedence hierarchy

harness-plan は、spec.md product contract and Plans.md task contract の co-required planning output を作る planning surface である。
precedence は `spec.md > sub-spec > Plans.md` のまま維持する。
Plans.md は task ledger、root `spec.md` は product contract であり、上下関係は崩さない。
渡された情報をそのまま Plans.md に落とさない。
計画作成や大きな task 追加では、最新情報・既存仕様・記憶・複数視点の議論を確認し、
このプロダクトに取り入れるべき要素だけを task contract に変換する。
`/harness-plan create` は `Spec delta` または `Spec skip reason` と `Plans.md` task 生成をセットで返す。
出力には必ず `Spec delta` または `Spec skip reason` を含める。

This uses a "dual-output requirement" pattern: every plan creation must produce either a spec delta OR an explicit spec-skip reason — never just code tasks.

Excerpt 2: harness-work SKILL.md (auto-selection)

Technique: Decision table encoding in prompt + explicit override protocol

## Execution Mode Auto Selection（フラグなし時の自動判定）

明示的なモードフラグ（`--parallel`, `--breezing`, `--codex`）がない場合、
対象タスク数に応じて最適なモードを自動選択する:

| 対象タスク数 | 自動選択モード | 理由 |
|-------------|---------------|------|
| **1 件** | Solo | オーバーヘッド最小。直接実装が最速 |
| **2〜3 件** | Parallel（Task tool） | Worker 分離のメリットが出始める閾値 |
| **4 件以上** | Breezing | Lead 調整 + Worker 並列 + Reviewer 独立の三者分離が効果的 |

### ルール
1. **明示フラグは常にオートモードを上書き**する
   - `--parallel N` → Parallel モード（タスク数に関係なく）
   - `--breezing` → Breezing モード（タスク数に関係なく）
   - `--codex` → Codex モード（タスク数に関係なく）

Excerpt 3: worker agent (isolation declaration)

Technique: Structured JSON input contract + ordered startup checklist

model: claude-sonnet-4-6
isolation: worktree

initialPrompt: |
  セッション開始後、最初に次の 4 点をこの順で確認する。
  1. task と task_id
  2. 変更してよいファイル
  3. DoD と sprint-contract のパス
  4. 仕様正本のパスまたは spec_skip_reason
  5. 実行する検証コマンド
  その後は TDD 判定 -> 実装 -> preflight -> 検証 -> commit 準備の順で進める。
  推測で要件を足さない。未確認事項は "missing-input" として明示する。

Excerpt 4: hooks.json — inline agent review hook

Technique: In-hook LLM review with structured deny/allow output

{
  "type": "agent",
  "prompt": "Review the following code change for quality issues. Check if the change: (1) introduces hardcoded secrets or credentials, (2) leaves TODO/FIXME stubs without implementation, (3) has obvious security vulnerabilities (SQL injection, XSS, command injection). If any issue is found, return JSON with permissionDecision: 'deny' and permissionDecisionReason explaining the issue. If the change looks acceptable, return nothing (exit 0). Input: $ARGUMENTS",
  "model": "haiku",
  "timeout": 30
}

Uniqueness

Chachamaru Claude Code Harness — Uniqueness

differs_from_seeds

Closest to superpowers (skills-as-behavioral-framework, Claude Code plugin) but differs in three significant ways: (1) a compiled Go binary that runs outside the LLM loop as an enforcement engine — superpowers has no binary; (2) named subagent definitions with worktree isolation per worker — superpowers has no named agents; (3) a TOML-driven config system (harness.toml) that generates all plugin files, making the harness itself a meta-level generator. Also resembles kiro in the sense of treating spec.md + Plans.md as a binding contract, but kiro is a closed IDE while Chachamaru distributes as a Claude Code plugin with optional Codex/OpenCode paths. Unlike BMAD-METHOD, the subagents here are not personas but functional roles with strict input/output contracts.

Positioning

This is the most technically complete Claude Code harness in the batch. It occupies a unique position as the only framework in the batch that ships a compiled binary, which means it can enforce constraints at the OS level (process exit codes, file permission checks) rather than relying entirely on the LLM to follow instructions.

Observable Failure Modes

Token-intensive: The 5-verb workflow with quality checks at every stage will consume significant context budget. Large Plans.md files may exhaust context before reaching implementation.
Japanese-first documentation: Most skill files are bilingual (JP+EN) but detailed reasoning is in Japanese — non-Japanese users may miss nuances.
Codex/OpenCode parity gap: "internal-compatible" is weaker than "supported" — the README explicitly warns not to inherit support claims from other projects.
TOML indirection: harness.toml as config requires harness sync to apply changes — users who edit plugin files directly will see their changes overwritten.
Go binary trust: The compiled binary is distributed as a pre-built artifact; users cannot audit its behavior without reading the Go source.

Workflow

Chachamaru Claude Code Harness — Workflow

Phases

Stage	Skill	Output	Gate
Investigate	(ad-hoc)	Evidence + unknowns	Do not promote unobserved data into claims
Plan	`/harness-plan`	`spec.md` + `Plans.md`	User approves or corrects the generated contract
Work	`/harness-work`	Code + tests	TDD required when task says so; `--tdd-bypass` requires explicit audit reason
Review	`/harness-review`	Independent verdict	Major findings block completion
PR	(manual)	Evidence pack	PR-ready is not release-ready
Release	`/harness-release`	Tag + release artifacts	Release preflight must pass

Execution Mode Auto-Selection (harness-work)

Task count	Auto mode
1	Solo (direct implementation)
2-3	Parallel (Task tool worker separation)
4+	Breezing (Lead + Worker + Reviewer triad)

Explicit flags override: --parallel N, --breezing, --codex.

Approval Gates

Plan approval: user must approve or correct spec.md + Plans.md before work begins.
TDD gate: tests written before implementation (bypassable with explicit reason written to audit log).
Review blocker: major findings from /harness-review halt progress.
Release preflight: /harness-release runs readiness checklist before tagging.

Artifact Map

Phase	Artifact
Plan	`spec.md` (product contract), `Plans.md` (task contract)
Work	implementation code, test files
Review	review artifact (verdict + findings)
Release	CHANGELOG boundary, tag, evidence package
Audit	`.claude/state/contracts/<task>.sprint-contract.json`

Memory Context

Chachamaru Claude Code Harness — Memory & Context

State Storage

spec.md: product contract (root truth)
Plans.md: task ledger with completion markers (cc:完了)
.claude/state/contracts/<task>.sprint-contract.json: per-task execution contracts
.claude/state/active-plan.json: tracks the currently active named plan
plans/manifest.json: registry of named plans (supports multiple parallel plan sets)
output/: harness-managed output directory (structured evidence)

Memory Type

File-based, project-scoped. No external database required.

Session Persistence

harness-sync reconciles Plans.md against implementation state across sessions
--resume <id|latest> flag on harness-work restarts interrupted tasks
/recap command re-summarizes context after long absence before resuming
Optional: harness-mem for extended memory (referenced in README, separate repo)

Context Compaction

The skill system uses Claude Code's native compaction. harness-plan tracks spec.md > sub-spec > Plans.md precedence to prevent plan drift across compact events.

Handoffs

Sprint contracts (.claude/state/contracts/) provide structured handoff context for parallel workers
Named plan switching (.claude/state/active-plan.json) enables multi-feature parallel development
bin/harness doctor --migration-report inventories old state without deletion for migration handoffs

Audit Trail

The Go binary writes structured logs for hook invocations. TDD bypass events require HARNESS_TDD_BYPASS_REASON environment variable or explicit inline reason, written to audit log.

Orchestration

Chachamaru Claude Code Harness — Orchestration

Multi-Agent Architecture

Yes. The breezing mode instantiates a three-agent team:

Lead: coordinates task execution and progress tracking
Worker: implements assigned tasks (4 workers max by default, configurable via --parallel N)
Reviewer: independent review at task completion

All agents are defined as .md files in agents/ with YAML frontmatter specifying model, isolation, skills, and initialPrompt.

Orchestration Pattern

task-decomposition-tree: Plans.md provides the task tree; harness-work decomposes it into solo/parallel/breezing execution based on task count.

The spawn mechanism is Claude Code's Task tool — workers are spawned via the Task tool from within harness-work.

Isolation Mechanism

git-worktree: the worker agent declares isolation: worktree. Each worker operates in its own worktree, preventing concurrent edits to shared files.

Multi-Model

Yes (limited): the inline hook review uses model: haiku for fast inline code review at the PreToolUse hook level. The main agent and workers default to the session model (claude-sonnet-4-6 per worker.md frontmatter).

Execution Mode

Interactive-loop: driven by slash commands (/harness-plan, /harness-work, etc.). Not a background daemon; each command invokes a discrete agent session.

Crash Recovery

--resume <id|latest> flag on harness-work restarts interrupted sessions. The sprint contract file preserves task state across interruptions.

Cross-Tool Portability

Medium: Claude Code is the fully supported path. Codex CLI and OpenCode are "internal-compatible" (install scripts exist, runtime parity not claimed). Cursor is a candidate.

Consensus

None. The reviewer agent produces a verdict that blocks or allows completion; no voting or quorum mechanism.

Ui Cli Surface

Chachamaru Claude Code Harness — UI & CLI Surface

Dedicated CLI Binary

Yes. bin/harness is a compiled Go binary (Darwin arm64/amd64, Linux amd64, Windows amd64).

It is NOT a thin wrapper over claude/codex CLI — it is an independent runtime for:

Hook dispatch (hook pre-tool, hook permission, hook setup-init)
Inbox checking (hook inbox-check)
Migration reporting (doctor --migration-report)
Browser guidance (hook browser-guide)
Harness sync (harness sync — regenerates plugin files from harness.toml)

CLI Subcommands

Subcommand	Action
`hook pre-tool`	PreToolUse lifecycle handler
`hook permission`	PermissionRequest handler
`hook setup-init`	Initialization hook
`hook inbox-check`	Check for pending inbox items
`hook browser-guide`	Browser automation guidance
`hook ask-user-question-normalize`	Normalize question format
`doctor --migration-report`	Inventory stale caches and old state
`sync`	Regenerate plugin files from harness.toml

Local UI

None (no web dashboard or TUI).

Slash Commands (Claude Code)

/harness-setup — initialization
/harness-plan — planning
/harness-work — implementation
/harness-review — review
/harness-release — release preflight
/recap — context re-summary
/undo (/rewind alias) — rollback last plan update

Observability

The Go binary writes structured hook execution logs
TDD bypass events are written to the audit log with mandatory reason
bin/harness doctor --migration-report provides inventory output

IDE Integration

Claude Code: native plugin (primary)
Cursor: .cursor/ profile (candidate)
No IDE fork or extension

Related frameworks

same archetype · same primary tool · same memory type

Liza ★ 227

A17 Compiled enforcement

Hardened multi-agent coding system with code-enforced role boundaries, adversarial doer/reviewer pairs, and 55+ failure mode…

DotForge ★ 6

A17 Compiled enforcement

Declare behavioral policies for Claude Code and compile them into enforcing PreToolUse hooks, with cross-project audit and sync…

Superpowers ★ 207k

A1 Skills-only

Enforces spec-first, TDD, and subagent-reviewed development as mandatory automatic workflows rather than optional practices.

Claude-Flow / Ruflo ★ 55k

A6 Multi-agent orchestrator

Eliminates single-agent context limits and sequential bottlenecks by orchestrating fault-tolerant swarms of specialized AI agents…

BMAD-METHOD ★ 48k

A4 Markdown scaffold

Provides a full agile delivery lifecycle with named expert-persona AI collaborators that elicit the human's best thinking rather…

Agent OS ★ 4.6k

A4 Markdown scaffold

Extracts implicit codebase conventions into token-efficient markdown standards files and injects them selectively into AI agent…

Distribution

Type: claude-plugin
License: MIT
Install: multi-step
Version: 4.12.3

Surfaces

CLI binary: harness
CLI subcmds: 8
Local UI: No
Tech stack: none

Components

Commands: 7
Skills: 34
Subagents: 4
Hooks: 8
MCP servers: 0
MCP tools: 0
Scripts: 3
Templates: 0

Workflow

Phases: 6
Approval gates: 4
Spec format: markdown
Spec storage: per-feature-folder
Delta or full: delta-diff

Orchestration

Multi-agent: Yes
Pattern: hierarchical
Isolation: git-worktree
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text

Execution

Mode: interactive-loop
Crash recovery: Yes
Compaction: Yes
Session handoff: Yes

Memory

Type: file-based
Persistence: project
Search: none
State files: 5 files

Quality

TDD: Yes
TDD mechanism: persona-instruction
Validators: 2
Self-review: adversarial-subagent

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: Yes
Audit log: Yes
Audit format: structured-md
Replay: Yes

Tools

Primary: claude-code
Targets: 4
Portability: medium

Signals

Stars: 1.6k
Last commit: 2026-05-26
Contributors: 4
Maintainer: active
Quality score: 9.5/10