Skip to content
/

Codex Harness MCP

chapzin-codex-harness-mcp · chapzin/codex-harness-mcp · ★ 7 · last commit 2026-05-12

Gives MCP-capable coding agents a local contract-lifecycle harness with governance audits and explicit completion gates.

Best whenWork is not done until governance passes and a completion gate verifies it — no implicit 'I think it's done'.
Skip ifClaiming done without verification evidence, Promoting harness changes without holdout trial
vs seeds
ccmemory(MCP-anchored state) but stores contract lifecycle and governance evidence rather than conversation recall. Resembles ki…
Primitive shape 21 total
Skills 1 MCP tools 20
00

Summary

chapzin Codex Harness MCP — Summary

Codex Harness MCP by chapzin is an MCP server (Archetype 3) that provides a local control-plane for Codex CLI and any MCP-compatible coding client: execution contracts, persistent RAG memory (project-local), raw trace recording, structured verification evidence, governance policy with PASS/FLAG/BLOCK audit, observability reports, harness profiles, eval runs, Meta-Harness-lite promotion records, natural-language harness spec export, and explicit completion gates. All state lives under .codex-harness/ in the project directory. The server is a dependency-free Node.js stdio MCP server (no shell execution, no remote calls), making it safe to install in CI/CD and regulated environments. It ships a multi-client installer that generates configs for 10+ coding tools (Codex CLI, Claude Code, OpenCode, Kilo, Gemini CLI, Cursor, VS Code/Copilot, Cline, Windsurf, Roo Code) without running their CLIs. The distinctive opinion is that "done" should require evidence — not just implementation but verification records, governance audit, and an explicit completion gate. Differs from seeds: closest to ccmemory (MCP-anchored, local state) but ccmemory is focused on cross-session recall while this is focused on contract lifecycle and governance; closest in philosophy to kiro (spec+tasks+verification as first-class artifacts) but distributed as a cross-client MCP server rather than a closed IDE.

01

Overview

chapzin Codex Harness MCP — Overview

Origin

GitHub: https://github.com/chapzin/codex-harness-mcp
Stars: 7
License: MIT
Language: JavaScript
Last commit: 2026-05-12
Published on skills.sh: https://skills.sh/chapzin/codex-harness-mcp/codex-harness-mcp

Philosophy

From README:

"Long-running agent work often fails in quiet ways: context gets compacted, research is repeated, failures are summarized too early, verification evidence disappears, harness changes are promoted without holdout evidence, and 'done' gets claimed before the work is actually checked."

"This project gives coding agents a durable system of record for that work. It does not replace the agent or run tasks for it."

From SKILL.md description:

"Use this skill when a user wants Codex CLI or another MCP-compatible coding client to work through a harness-engineering loop with explicit execution contracts, persistent local knowledge/RAG, durable traces, structured verification evidence, project-local governance policy, PASS/FLAG/BLOCK governance audits..."

Key Design Decisions

  1. Evidence-first completion: the completion gate is the primary differentiator — work is not done until governance audit passes and verification records exist.
  2. No command execution: the MCP server records verification evidence but does not run commands; agents must run commands and report results.
  3. Prompt-injection bounded: user/source text is returned inside <untrusted-data> blocks.
  4. Client-portable: value is in the state model, not in any specific client; the installer supports 10+ clients.
  5. Meta-Harness: the harness can propose, promote, and record changes to itself with holdout evidence — the framework is self-describing.
02

Architecture

chapzin Codex Harness MCP — Architecture

Distribution

Available via:

  1. skills.sh: npx skills add chapzin/codex-harness-mcp -g -a codex -y --copy
  2. Multi-client installer: node scripts/install-codex-harness-mcp.mjs
  3. Direct Node.js server

State Directory

All project state lives under .codex-harness/ within the project directory.

Directory Tree

codex-harness-mcp/
├── .github/
├── agents/
│   └── openai.yaml           # OpenAI Agents SDK integration config
├── docs/
│   └── multi-client-setup.md
├── scripts/
│   ├── install-codex-harness-mcp.mjs   # Multi-client installer
│   └── lib/                            # Installer utilities
├── tests/                              # Test suite
├── SKILL.md                            # Claude Code / skills.sh skill definition
├── README.md
└── src/
    └── server.mjs              # Dependency-free Node.js stdio MCP server

Required Runtime

  • Node.js 20+
  • Any MCP-capable client

Supported Clients (installer)

  • Codex CLI
  • Claude Code
  • OpenCode
  • Kilo CLI / Kilo Code
  • Gemini CLI
  • Cursor
  • VS Code / GitHub Copilot MCP
  • Cline
  • Windsurf Cascade
  • Roo Code (best-effort, announced shutdown 2026-05-15)

State Files

All under .codex-harness/:

  • contracts
  • traces
  • verification records
  • governance policy
  • eval profiles and runs
  • memory/RAG store
  • observability report
  • Meta-Harness proposals
03

Components

chapzin Codex Harness MCP — Components

MCP Server Tools (from README description)

Tool Category Tools
Contracts harness_create_contract, harness_update_contract, harness_read_contract
Memory/RAG harness_store_research, harness_store_lesson, harness_query_knowledge
Traces harness_record_trace, harness_read_traces, harness_next_step_recovery
Verification harness_record_verification, harness_read_verification
Governance harness_write_governance_policy, harness_audit_governance, harness_export_governance_report
Observability harness_export_observability_report
Harness Evals harness_create_profile, harness_run_eval, harness_compare_runs
Meta-Harness harness_record_proposal, harness_record_promotion
Harness Spec harness_export_harness_spec
Completion harness_completion_gate
Setup harness_bootstrap, harness_migrate
Handoff harness_compact_context

MCP Resources

  • harness://observability/report — exposes current observability report
  • harness://harness/spec — exports natural-language harness spec
  • harness://governance/policy — governance policy
  • harness://governance/audit — governance audit results

Scripts (1)

Path Purpose Trigger
scripts/install-codex-harness-mcp.mjs Multi-client MCP config installer manual

Skill (1)

Name Purpose
SKILL.md Skill definition for Claude Code / skills.sh integration

Agent (1)

Name Purpose
agents/openai.yaml OpenAI Agents SDK integration configuration
05

Prompts

chapzin Codex Harness MCP — Prompts

Excerpt 1: SKILL.md description field

Technique: Exhaustive trigger list + context boundary

name: codex-harness-mcp
description: Use this skill when a user wants Codex CLI or another MCP-compatible coding client to work through a harness-engineering loop with explicit execution contracts, persistent local knowledge/RAG, durable traces, structured verification evidence, project-local governance policy, PASS/FLAG/BLOCK governance audits, trace-level observability reports, harness profiles, eval cases/runs, Meta-Harness-lite proposals and promotion decisions, natural-language harness spec export, MCP resources/prompts, multi-client installer, and completion gates before claiming work is done.

Do not trigger it for a one-line question that does not need durable state, research memory, verification evidence, or multi-step work.

Excerpt 2: README start prompt

Technique: Explicit protocol-first activation prompt

Use codex-harness. Bootstrap the project, migrate old harness state if needed, query local knowledge, create a small contract, record traces and lessons, record verification evidence, export the observability report when the run gets complex, record eval/profile/proposal evidence if changing the harness, and run the eval gate before saying the task is done.

Excerpt 3: Prompt-injection safety from README

Technique: Structured untrusted-data wrapping

"Stored user/source text is returned inside <untrusted-data> blocks."

This is a defense against prompt injection via stored memory: the MCP server wraps all stored user text in XML tags that signal untrusted content to the consuming agent.

09

Uniqueness

chapzin Codex Harness MCP — Uniqueness

differs_from_seeds

Closest to ccmemory (Archetype 3: MCP-anchored state) but ccmemory stores conversation memory in Neo4j for cross-session recall, while this stores contract lifecycle, governance audits, trace records, and eval data for current-task completion evidence. Also resembles kiro in philosophy (spec+tasks+verification as binding artifacts, explicit gates before "done") but kiro is a closed IDE while this is a portable MCP server any client can use. Unlike taskmaster-ai (also Archetype 3, file-based task.json), this does not manage task decomposition — it manages execution evidence and governance for tasks the agent drives itself.

Most Distinctive Feature

The three-level governance system (PASS/FLAG/BLOCK) and the Meta-Harness-lite capability (proposing and recording changes to the harness itself with holdout evidence) are unique in the catalog. No other framework treats harness self-improvement as a governed, evidence-tracked process.

Positioning

This is the most safety-conscious framework in the batch: no command execution in the MCP server, prompt-injection protection via <untrusted-data> wrapping, dependency-free Node.js, and explicit governance audit as a mandatory completion gate. Suitable for regulated environments where agents need an audit trail.

Observable Failure Modes

  1. Low stars / low visibility: 7 stars — adoption is minimal; no community validation.
  2. State proliferation: .codex-harness/ can grow large in long projects — no described compaction for the state itself.
  3. No command execution: verification is recorded-but-not-run — requires the agent to correctly report execution results; the MCP server cannot verify them independently.
  4. 10-client installer complexity: generating configs for 10+ clients requires the user to know which clients to target.
04

Workflow

chapzin Codex Harness MCP — Workflow

One-Minute Workflow (from README)

Step Action Artifact
1 Ask agent to use the harness
2 Bootstrap or migrate .codex-harness/ state directory
3 Query existing knowledge before repeating research knowledge query
4 Create a small contract contract record
5 Work inside the contract boundaries implementation
6 Record attempts, failures, decisions, research, lessons, and verification trace records + verification records
7 Audit governance with harness_audit_governance; stop on BLOCK, call out FLAG governance report
8 Export observability report when run gets long/risky/unclear observability report
9 If changing the harness: record profiles, evals, proposals, promotion decisions Meta-Harness records
10 Run completion gate before saying work is done gate pass/fail

Approval Gates

The completion gate (harness_completion_gate) is the primary explicit gate. Governance audit BLOCK status also stops work.

Governance Levels

  • PASS: contract quality, outputs, traces, verification, gates all acceptable
  • FLAG: issues noted but not blocking; agent must call out explicitly
  • BLOCK: work cannot proceed; requires resolution

Harness Optimization Workflow

For changing the harness itself:

  1. Record current profile baseline
  2. Create eval cases for the change
  3. Run eval against current profile
  4. Record Meta-Harness-lite proposal
  5. Run holdout trial
  6. Record promotion or rejection decision with evidence
06

Memory Context

chapzin Codex Harness MCP — Memory & Context

State Storage

Local file-based under .codex-harness/ in the project directory:

  • Contracts (execution scope, budgets, expected outputs, completion conditions)
  • Research notes (persistent, queryable)
  • Implementation lessons (persistent, queryable)
  • Raw traces (attempts, failures, decisions)
  • Verification records (commands run, outcomes)
  • Governance policy (project-local rules)
  • Eval profiles and run results
  • Meta-Harness proposal and promotion records

Memory Type

File-based with RAG (local full-text search over research/lesson notes). Not a vector DB — the README describes "persistent local RAG" using Node.js built-in modules only.

Handoff

harness_compact_context produces a compact handoff context for long sessions — preserves key state in a summary format for context window management.

Cross-Session

Yes. All state persists in .codex-harness/ and survives session restarts. harness_query_knowledge retrieves relevant notes from previous sessions.

Prompt-Injection Protection

Stored user text is returned inside <untrusted-data> XML blocks, making it clear to the consuming LLM that the content came from an external (potentially adversarial) source.

07

Orchestration

chapzin Codex Harness MCP — Orchestration

Multi-Agent

No multi-agent coordination built into the MCP server itself. It is a state-tracking service that single agents consume. The governance system can bound subagent scope (listed in governance policy), but the server does not spawn agents.

Orchestration Pattern

None (the MCP server is infrastructure, not an orchestrator).

Isolation Mechanism

None — the MCP server records state for whatever agent uses it.

Multi-Model

No.

Execution Mode

Event-driven: the MCP server responds to tool calls from whatever client is running. The agent drives the workflow; the MCP server maintains state.

Cross-Tool Portability

High: the installer generates configs for 10+ MCP-capable coding clients. The MCP server itself is client-agnostic (stdio).

08

Ui Cli Surface

chapzin Codex Harness MCP — UI & CLI Surface

Dedicated CLI Binary

No standalone binary. The installer script is Node.js.

Local UI

None.

MCP Resources (read-only endpoints)

  • harness://observability/report
  • harness://harness/spec
  • harness://governance/policy
  • harness://governance/audit

Installation CLI

node scripts/install-codex-harness-mcp.mjs with --clients and --scope flags generates MCP config for target clients.

Example:

node scripts/install-codex-harness-mcp.mjs --clients codex,claude-code,opencode --scope auto --project .

Observability

  • harness_export_observability_report — local "flight recorder": contract state, traces, eval posture, memory, governance, and blind spots
  • All state in .codex-harness/ is human-readable

Protocol

stdio MCP transport (dependency-free Node.js).

Related frameworks

same archetype · same primary tool · same memory type

alirezarezvani/claude-skills ★ 16k

313+ skills for 12 AI tools covering engineering, marketing, C-level advisory, compliance, research, and finance — all from one…

MoAI-ADK ★ 1.0k

Implements Harness Engineering as a Go-binary-installed Claude Code environment with auto-TDD/DDD methodology selection, 20-event…

REAP (c-d-cc/reap) ★ 41

Prevent context loss, scattered development, and forgotten lessons through a generation-based lifecycle where AI and human…

meta-agent-teams (jbrahy) ★ 2

Build self-improving AI agent teams via a supervised training loop: specialist agents advise, a meta-agent evolves prompts based…

Browser Harness ★ 14k

Thin, self-healing CDP harness connecting an LLM to the user's real Chrome browser with coordinate-first clicking and…

SwarmVault ★ 492

Production-grade CLI for Karpathy's LLM Wiki pattern: ingests any content into a local-first durable markdown wiki + knowledge…