Codex Harness MCP

chapzin-codex-harness-mcp · chapzin/codex-harness-mcp · ★ 7 · last commit 2026-05-12

Gives MCP-capable coding agents a local contract-lifecycle harness with governance audits and explicit completion gates.

Best whenWork is not done until governance passes and a completion gate verifies it — no implicit 'I think it's done'.

Skip ifClaiming done without verification evidence, Promoting harness changes without holdout trial

vs seeds

ccmemory(MCP-anchored state) but stores contract lifecycle and governance evidence rather than conversation recall. Resembles ki…

Primitive shape 21 total

Skills 1 MCP tools 20

Summary

chapzin Codex Harness MCP — Summary

Codex Harness MCP by chapzin is an MCP server (Archetype 3) that provides a local control-plane for Codex CLI and any MCP-compatible coding client: execution contracts, persistent RAG memory (project-local), raw trace recording, structured verification evidence, governance policy with PASS/FLAG/BLOCK audit, observability reports, harness profiles, eval runs, Meta-Harness-lite promotion records, natural-language harness spec export, and explicit completion gates. All state lives under .codex-harness/ in the project directory. The server is a dependency-free Node.js stdio MCP server (no shell execution, no remote calls), making it safe to install in CI/CD and regulated environments. It ships a multi-client installer that generates configs for 10+ coding tools (Codex CLI, Claude Code, OpenCode, Kilo, Gemini CLI, Cursor, VS Code/Copilot, Cline, Windsurf, Roo Code) without running their CLIs. The distinctive opinion is that "done" should require evidence — not just implementation but verification records, governance audit, and an explicit completion gate. Differs from seeds: closest to ccmemory (MCP-anchored, local state) but ccmemory is focused on cross-session recall while this is focused on contract lifecycle and governance; closest in philosophy to kiro (spec+tasks+verification as first-class artifacts) but distributed as a cross-client MCP server rather than a closed IDE.

Overview

chapzin Codex Harness MCP — Overview

Origin

GitHub: https://github.com/chapzin/codex-harness-mcp
Stars: 7
License: MIT
Language: JavaScript
Last commit: 2026-05-12
Published on skills.sh: https://skills.sh/chapzin/codex-harness-mcp/codex-harness-mcp

Philosophy

From README:

"Long-running agent work often fails in quiet ways: context gets compacted, research is repeated, failures are summarized too early, verification evidence disappears, harness changes are promoted without holdout evidence, and 'done' gets claimed before the work is actually checked."

"This project gives coding agents a durable system of record for that work. It does not replace the agent or run tasks for it."

From SKILL.md description:

"Use this skill when a user wants Codex CLI or another MCP-compatible coding client to work through a harness-engineering loop with explicit execution contracts, persistent local knowledge/RAG, durable traces, structured verification evidence, project-local governance policy, PASS/FLAG/BLOCK governance audits..."

Key Design Decisions

Evidence-first completion: the completion gate is the primary differentiator — work is not done until governance audit passes and verification records exist.
No command execution: the MCP server records verification evidence but does not run commands; agents must run commands and report results.
Prompt-injection bounded: user/source text is returned inside <untrusted-data> blocks.
Client-portable: value is in the state model, not in any specific client; the installer supports 10+ clients.
Meta-Harness: the harness can propose, promote, and record changes to itself with holdout evidence — the framework is self-describing.

Architecture

chapzin Codex Harness MCP — Architecture

Distribution

Available via:

skills.sh: npx skills add chapzin/codex-harness-mcp -g -a codex -y --copy
Multi-client installer: node scripts/install-codex-harness-mcp.mjs
Direct Node.js server

State Directory

All project state lives under .codex-harness/ within the project directory.

Directory Tree

codex-harness-mcp/
├── .github/
├── agents/
│   └── openai.yaml           # OpenAI Agents SDK integration config
├── docs/
│   └── multi-client-setup.md
├── scripts/
│   ├── install-codex-harness-mcp.mjs   # Multi-client installer
│   └── lib/                            # Installer utilities
├── tests/                              # Test suite
├── SKILL.md                            # Claude Code / skills.sh skill definition
├── README.md
└── src/
    └── server.mjs              # Dependency-free Node.js stdio MCP server

Required Runtime

Node.js 20+
Any MCP-capable client

Supported Clients (installer)

Codex CLI
Claude Code
OpenCode
Kilo CLI / Kilo Code
Gemini CLI
Cursor
VS Code / GitHub Copilot MCP
Cline
Windsurf Cascade
Roo Code (best-effort, announced shutdown 2026-05-15)

State Files

All under .codex-harness/:

contracts
traces
verification records
governance policy
eval profiles and runs
memory/RAG store
observability report
Meta-Harness proposals

Components

chapzin Codex Harness MCP — Components

MCP Server Tools (from README description)

Tool Category	Tools
Contracts	`harness_create_contract`, `harness_update_contract`, `harness_read_contract`
Memory/RAG	`harness_store_research`, `harness_store_lesson`, `harness_query_knowledge`
Traces	`harness_record_trace`, `harness_read_traces`, `harness_next_step_recovery`
Verification	`harness_record_verification`, `harness_read_verification`
Governance	`harness_write_governance_policy`, `harness_audit_governance`, `harness_export_governance_report`
Observability	`harness_export_observability_report`
Harness Evals	`harness_create_profile`, `harness_run_eval`, `harness_compare_runs`
Meta-Harness	`harness_record_proposal`, `harness_record_promotion`
Harness Spec	`harness_export_harness_spec`
Completion	`harness_completion_gate`
Setup	`harness_bootstrap`, `harness_migrate`
Handoff	`harness_compact_context`

MCP Resources

harness://observability/report — exposes current observability report
harness://harness/spec — exports natural-language harness spec
harness://governance/policy — governance policy
harness://governance/audit — governance audit results

Scripts (1)

Path	Purpose	Trigger
scripts/install-codex-harness-mcp.mjs	Multi-client MCP config installer	manual

Skill (1)

Name	Purpose
SKILL.md	Skill definition for Claude Code / skills.sh integration

Agent (1)

Name	Purpose
agents/openai.yaml	OpenAI Agents SDK integration configuration

Prompts

chapzin Codex Harness MCP — Prompts

Excerpt 1: SKILL.md description field

Technique: Exhaustive trigger list + context boundary

name: codex-harness-mcp
description: Use this skill when a user wants Codex CLI or another MCP-compatible coding client to work through a harness-engineering loop with explicit execution contracts, persistent local knowledge/RAG, durable traces, structured verification evidence, project-local governance policy, PASS/FLAG/BLOCK governance audits, trace-level observability reports, harness profiles, eval cases/runs, Meta-Harness-lite proposals and promotion decisions, natural-language harness spec export, MCP resources/prompts, multi-client installer, and completion gates before claiming work is done.

Do not trigger it for a one-line question that does not need durable state, research memory, verification evidence, or multi-step work.

Excerpt 2: README start prompt

Technique: Explicit protocol-first activation prompt

Use codex-harness. Bootstrap the project, migrate old harness state if needed, query local knowledge, create a small contract, record traces and lessons, record verification evidence, export the observability report when the run gets complex, record eval/profile/proposal evidence if changing the harness, and run the eval gate before saying the task is done.

Excerpt 3: Prompt-injection safety from README

Technique: Structured untrusted-data wrapping

"Stored user/source text is returned inside <untrusted-data> blocks."

This is a defense against prompt injection via stored memory: the MCP server wraps all stored user text in XML tags that signal untrusted content to the consuming agent.

Uniqueness

chapzin Codex Harness MCP — Uniqueness

differs_from_seeds

Closest to ccmemory (Archetype 3: MCP-anchored state) but ccmemory stores conversation memory in Neo4j for cross-session recall, while this stores contract lifecycle, governance audits, trace records, and eval data for current-task completion evidence. Also resembles kiro in philosophy (spec+tasks+verification as binding artifacts, explicit gates before "done") but kiro is a closed IDE while this is a portable MCP server any client can use. Unlike taskmaster-ai (also Archetype 3, file-based task.json), this does not manage task decomposition — it manages execution evidence and governance for tasks the agent drives itself.

Most Distinctive Feature

The three-level governance system (PASS/FLAG/BLOCK) and the Meta-Harness-lite capability (proposing and recording changes to the harness itself with holdout evidence) are unique in the catalog. No other framework treats harness self-improvement as a governed, evidence-tracked process.

Positioning

This is the most safety-conscious framework in the batch: no command execution in the MCP server, prompt-injection protection via <untrusted-data> wrapping, dependency-free Node.js, and explicit governance audit as a mandatory completion gate. Suitable for regulated environments where agents need an audit trail.

Observable Failure Modes

Low stars / low visibility: 7 stars — adoption is minimal; no community validation.
State proliferation: .codex-harness/ can grow large in long projects — no described compaction for the state itself.
No command execution: verification is recorded-but-not-run — requires the agent to correctly report execution results; the MCP server cannot verify them independently.
10-client installer complexity: generating configs for 10+ clients requires the user to know which clients to target.

Workflow

chapzin Codex Harness MCP — Workflow

One-Minute Workflow (from README)

Step	Action	Artifact
1	Ask agent to use the harness	—
2	Bootstrap or migrate `.codex-harness/`	state directory
3	Query existing knowledge before repeating research	knowledge query
4	Create a small contract	contract record
5	Work inside the contract boundaries	implementation
6	Record attempts, failures, decisions, research, lessons, and verification	trace records + verification records
7	Audit governance with `harness_audit_governance`; stop on BLOCK, call out FLAG	governance report
8	Export observability report when run gets long/risky/unclear	observability report
9	If changing the harness: record profiles, evals, proposals, promotion decisions	Meta-Harness records
10	Run completion gate before saying work is done	gate pass/fail

Approval Gates

The completion gate (harness_completion_gate) is the primary explicit gate. Governance audit BLOCK status also stops work.

Governance Levels

PASS: contract quality, outputs, traces, verification, gates all acceptable
FLAG: issues noted but not blocking; agent must call out explicitly
BLOCK: work cannot proceed; requires resolution

Harness Optimization Workflow

For changing the harness itself:

Record current profile baseline
Create eval cases for the change
Run eval against current profile
Record Meta-Harness-lite proposal
Run holdout trial
Record promotion or rejection decision with evidence

Memory Context

chapzin Codex Harness MCP — Memory & Context

State Storage

Local file-based under .codex-harness/ in the project directory:

Contracts (execution scope, budgets, expected outputs, completion conditions)
Research notes (persistent, queryable)
Implementation lessons (persistent, queryable)
Raw traces (attempts, failures, decisions)
Verification records (commands run, outcomes)
Governance policy (project-local rules)
Eval profiles and run results
Meta-Harness proposal and promotion records

Memory Type

File-based with RAG (local full-text search over research/lesson notes). Not a vector DB — the README describes "persistent local RAG" using Node.js built-in modules only.

Handoff

harness_compact_context produces a compact handoff context for long sessions — preserves key state in a summary format for context window management.

Cross-Session

Yes. All state persists in .codex-harness/ and survives session restarts. harness_query_knowledge retrieves relevant notes from previous sessions.

Prompt-Injection Protection

Stored user text is returned inside <untrusted-data> XML blocks, making it clear to the consuming LLM that the content came from an external (potentially adversarial) source.

Orchestration

chapzin Codex Harness MCP — Orchestration

Multi-Agent

No multi-agent coordination built into the MCP server itself. It is a state-tracking service that single agents consume. The governance system can bound subagent scope (listed in governance policy), but the server does not spawn agents.

Orchestration Pattern

None (the MCP server is infrastructure, not an orchestrator).

Isolation Mechanism

None — the MCP server records state for whatever agent uses it.

Multi-Model

No.

Execution Mode

Event-driven: the MCP server responds to tool calls from whatever client is running. The agent drives the workflow; the MCP server maintains state.

Cross-Tool Portability

High: the installer generates configs for 10+ MCP-capable coding clients. The MCP server itself is client-agnostic (stdio).

Ui Cli Surface

chapzin Codex Harness MCP — UI & CLI Surface

Dedicated CLI Binary

No standalone binary. The installer script is Node.js.

Local UI

None.

MCP Resources (read-only endpoints)

harness://observability/report
harness://harness/spec
harness://governance/policy
harness://governance/audit

Installation CLI

node scripts/install-codex-harness-mcp.mjs with --clients and --scope flags generates MCP config for target clients.

Example:

node scripts/install-codex-harness-mcp.mjs --clients codex,claude-code,opencode --scope auto --project .

Observability

harness_export_observability_report — local "flight recorder": contract state, traces, eval posture, memory, governance, and blind spots
All state in .codex-harness/ is human-readable

Protocol

stdio MCP transport (dependency-free Node.js).

Related frameworks

same archetype · same primary tool · same memory type

alirezarezvani/claude-skills ★ 16k

A18 Self-evolving

313+ skills for 12 AI tools covering engineering, marketing, C-level advisory, compliance, research, and finance — all from one…

MoAI-ADK ★ 1.0k

A18 Self-evolving

Implements Harness Engineering as a Go-binary-installed Claude Code environment with auto-TDD/DDD methodology selection, 20-event…

REAP (c-d-cc/reap) ★ 41

A18 Self-evolving

Prevent context loss, scattered development, and forgotten lessons through a generation-based lifecycle where AI and human…

meta-agent-teams (jbrahy) ★ 2

A18 Self-evolving

Build self-improving AI agent teams via a supervised training loop: specialist agents advise, a meta-agent evolves prompts based…

Browser Harness ★ 14k

A18 Self-evolving

Thin, self-healing CDP harness connecting an LLM to the user's real Chrome browser with coordinate-first clicking and…

SwarmVault ★ 492

A18 Self-evolving

Production-grade CLI for Karpathy's LLM Wiki pattern: ingests any content into a local-first durable markdown wiki + knowledge…

Distribution

Type: mcp-server
License: MIT
Install: multi-step

Surfaces

CLI binary: No
CLI subcmds: 0
Local UI: No
Tech stack: none

Components

Commands: 0
Skills: 1
Subagents: 0
Hooks: 0
MCP servers: 1
MCP tools: 20
Scripts: 1
Templates: 0

Workflow

Phases: 6
Approval gates: 2
Spec format: markdown
Spec storage: per-feature-folder
Delta or full: delta-diff

Orchestration

Multi-agent: No
Pattern: none
Max concurrent: 1
Isolation: none
Consensus: none
Prompt chaining: No

Multi-model

Multi-model: No
BYOK: Yes
Modal: text

Execution

Mode: event-driven
Crash recovery: Yes
Compaction: Yes
Session handoff: Yes
Streaming: No

Memory

Type: file-based
Persistence: project
Search: full-text
State files: 6 files

Quality

TDD: Optional
TDD mechanism: none
Validators: 2
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: Yes
Audit format: structured-md
Replay: Yes

Tools

Primary: codex-cli
Targets: 9
Portability: high

Signals

Stars: 7
Last commit: 2026-05-12
Contributors: 1
Maintainer: active
Quality score: 5.4/10