Skip to content
/

ching-kuo/claude-codex

ching-kuo-claude-codex · ching-kuo/claude-codex · ★ 16 · last commit 2026-04-01

Bidirectional Claude↔Codex critique loop via MCP where Claude plans/implements and Codex audits/reviews with APPROVED/WARNING/BLOCKED verdicts.

Best whenTwo AI models critiquing each other (with dispute resolution via codex-reply) produces better output than either model working alone or one delegating to the…
Skip ifCalling Codex via Bash instead of MCP, Passing .env secrets to Codex MCP
vs seeds
superpowers(Claude Code skills, TDD) but it has no external agent. Compared to codex-plugin-cc (one-directional delegation), ching-…
Primitive shape 13 total
Commands 5 Skills 6 MCP tools 2
00

Summary

ching-kuo/claude-codex — Summary

Elevator pitch: A Claude Code skill+command set that implements a structured multi-model loop where Claude and OpenAI Codex play distinct, non-interchangeable roles via Codex's MCP server. Claude (Claude Code) is the planner and primary implementer; Codex is the reviewer and auditor. Five skills implement four patterns: plan-codex (Claude plans with Opus, Codex audits plan up to 3 rounds), claude-codex (Claude implements, Codex reviews diff with APPROVED/WARNING/BLOCKED verdicts), execute-codex (size-based smart routing: Claude for small, Codex for large), and two TDD variants. The Codex MCP integration (claude mcp add codex) is a central dependency — Codex is always invoked via mcp__codex__codex tool, not via Bash. This is the only framework in this batch that explicitly encodes the APPROVED/WARNING/BLOCKED structured verdict pattern and uses Codex as a peer-review agent with veto power over Claude's work. Compared to seeds: differs from superpowers (skills-only, no external agent) and taskmaster-ai (single-model MCP) by implementing a two-model dialectic where Codex is both auditor and fallback implementer for large tasks.

01

Overview

ching-kuo/claude-codex — Overview

Origin

By GitHub user ching-kuo. 16 stars. MIT license. Last commit: 2026-04-01. Single-contributor project.

Philosophy

A structured multi-model approach where Claude and Codex have distinct, non-interchangeable roles determined by task type and change size. The README's workflow table makes the role assignments explicit:

/plan-codex: Claude plans with Opus → Codex audits plan (max 3 rounds) → Plan saved to .claude/plan/
/execute-codex: Small change → Claude implements; Large change → Codex implements; Reviewer always reviews
/claude-codex: Claude implements → Codex reviews (APPROVED/WARNING/BLOCKED) → Claude fixes issues

Model Sovereignty Principle (verbatim)

Code Sovereignty: Codex has zero filesystem write access during planning — all file operations by Claude only.

This principle from plan-codex limits Codex to read-only roles during planning, ensuring Claude owns all file mutations in the planning phase. The execute-codex and claude-codex skills relax this constraint for implementation.

Feature-dev Plugin Dependency

The framework depends on the feature-dev plugin from anthropics/claude-plugins-official for the code-reviewer agent used in execute-codex and tdd-execute-codex. This makes it a dependency-layered system: Claude Code + Codex MCP + feature-dev plugin.

Context Sanitization (verbatim)

Context Sanitization: Never pass .env, secrets, tokens, API keys, or credentials to any external agent or MCP. Exclude files matching .env*, *secret*, *credential*, *.pem, *.key. Redact inline secrets before sending.

The only framework in this batch with explicit secret-scrubbing rules for MCP calls.

02

Architecture

ching-kuo/claude-codex — Architecture

Distribution

  • Type: Claude Code skills + commands (clone-and-configure)
  • License: MIT
  • Install: Clone and copy to ~/.claude/

Install

git clone https://github.com/ching-kuo/claude-codex
cd claude-codex

cp -r skills/*   ~/.claude/skills/
cp -r prompts/*  ~/.claude/prompts/
cp -r commands/* ~/.claude/commands/   # optional: for model pinning / tool restrictions

# Configure Codex as MCP server:
claude mcp add codex -s user -- codex -c model=gpt-5.3-codex -c model_reasoning_effort=high mcp-server

# Install feature-dev plugin:
claude plugin add claude-plugins-official/feature-dev

# Optional: suppress Codex reasoning tokens:
# Add to ~/.codex/config.toml: hide_agent_reasoning = true

Required Runtime

  • Claude Code
  • OpenAI Codex CLI with MCP server mode (codex mcp-server)
  • claude plugin add claude-plugins-official/feature-dev (for code-reviewer)
  • OPENAI_API_KEY in environment

Directory Tree

~/.claude/
├── skills/
│   ├── codex-mcp/
│   │   ├── SKILL.md                  # Auto-loaded Codex MCP usage knowledge
│   │   └── reference.md              # Detailed patterns and examples
│   ├── plan-codex/
│   │   ├── SKILL.md
│   │   ├── codex-analyzer-role.md    # Codex role: plan auditor
│   │   └── evals/evals.json
│   ├── claude-codex/
│   │   ├── SKILL.md
│   │   └── evals/evals.json
│   ├── execute-codex/
│   │   ├── SKILL.md
│   │   ├── codex-architect-role.md   # Codex role: implementer
│   │   └── evals/evals.json
│   ├── tdd-claude-codex/
│   │   ├── SKILL.md
│   │   ├── codex-analyzer-role.md    # Codex role: test auditor
│   │   ├── tdd-specialist-role.md    # TDD specialist role for tests
│   │   └── evals/evals.json
│   └── tdd-execute-codex/
│       ├── SKILL.md
│       ├── codex-architect-role.md
│       ├── tdd-specialist-role.md
│       └── evals/evals.json
├── commands/                         # Backward-compatible slash commands
│   ├── plan-codex.md
│   ├── execute-codex.md
│   ├── claude-codex.md
│   ├── tdd-claude-codex.md
│   └── tdd-execute-codex.md
└── prompts/
    └── codex/
        ├── analyzer.md               # Codex role: plan auditor
        ├── architect.md              # Codex role: implementer
        └── reviewer.md               # Codex role: code reviewer

Target AI Tools

  • Primary: Claude Code (orchestrator/implementer)
  • Peer: OpenAI Codex via MCP server (reviewer/auditor/implementer for large tasks)
  • Dependency: feature-dev plugin (for code-reviewer agent in execute flows)
03

Components

ching-kuo/claude-codex — Components

Skills (6 user-facing + 1 internal)

Skill Purpose Models Used
plan-codex Claude (Opus) creates plan; Codex audits plan up to 3 rounds Claude Opus + Codex
claude-codex Claude implements; Codex reviews with APPROVED/WARNING/BLOCKED verdicts Claude (active model) + Codex
execute-codex Smart routing: Claude (small ≤2 files, ≤30 lines) or Codex (large); code-reviewer reviews Claude Sonnet + Codex
tdd-claude-codex TDD: Claude writes tests (RED), Codex audits tests, Claude implements (GREEN), Codex reviews Claude + Codex
tdd-execute-codex TDD: Claude writes tests, Codex audits tests, smart routing for implementation, code-reviewer reviews Claude + Codex
codex-mcp Internal: Codex MCP usage knowledge (auto-loaded) (reference only)

Commands (5, backward-compatible)

Same as skills but as /command form. Support model pinning and explicit allowed-tools restrictions that skills cannot enforce. Commands: plan-codex.md, execute-codex.md, claude-codex.md, tdd-claude-codex.md, tdd-execute-codex.md.

Prompts (3, in ~/.claude/prompts/codex/)

Name Purpose
analyzer.md Codex role: plan auditor (source for codex-analyzer-role.md in skills)
architect.md Codex role: implementer (source for codex-architect-role.md)
reviewer.md Codex role: code reviewer

Bundled Role Files (per skill)

Each skill bundles its Codex role file directly:

  • codex-analyzer-role.md — injected as developer-instructions when calling Codex MCP for plan/test auditing
  • codex-architect-role.md — injected when calling Codex MCP for implementation
  • tdd-specialist-role.md — injected into Task subagent for writing failing tests

Evals (per skill)

Each skill has evals/evals.json — evaluation scenarios for skill-creator tool. The README notes: "Skills are the primary interface. They support eval-based testing via skill-creator and are easier to iterate on than commands."

External Dependency

  • feature-dev plugin: provides code-reviewer agent (subagent_type: "feature-dev:code-reviewer") used in execute-codex and tdd-execute-codex for final implementation review

Codex Invocation Pattern

All Codex calls go through mcp__codex__codex (main call) and mcp__codex__codex-reply (iteration call), never through Bash codex .... This is the framework's key architectural constraint.

05

Prompts

ching-kuo/claude-codex — Prompts

Prompt 1: plan-codex — Codex Audit Loop (verbatim)

Source: skills/plan-codex/SKILL.md

**MANDATORY Codex availability check**: `mcp__codex__codex` MUST be listed in the available tools (either in the tool list or in `<available-deferred-tools>`). Do NOT skip or bypass this phase. If the tool is genuinely absent from both locations, **stop and tell the user**: "Codex MCP is not available. This skill requires Codex for plan audit. Please add the Codex MCP server." Do not proceed without Codex — the audit loop is this skill's core value.

**Critical Evaluation of Audit Findings**

Before addressing any issues, critically evaluate each finding:
1. **Assess correctness**: Is the finding technically accurate?
2. **Check context**: Does the reviewer have full context about architectural decisions?
3. **Verify applicability**: Does the suggestion improve the plan, or is it a preference / false positive?

**If a finding seems incorrect or questionable:**
- Do NOT address it or count it as an iteration. Instead, reply to the reviewer with your technical reasoning.
- Call `mcp__codex__codex-reply` (reuse threadId) explaining why the finding appears incorrect.
- Discussion replies do NOT increment the iteration counter — only revise-and-re-audit cycles count.

Prompting technique: Structured adversarial review with explicit false-positive protection. The codex-reply mechanism allows Claude to push back on Codex findings — this is a rare bidirectional critique pattern where neither model is definitively authoritative.


Prompt 2: claude-codex — APPROVED/WARNING/BLOCKED Verdict Pattern (verbatim)

Source: skills/claude-codex/SKILL.md

Phase 3: Codex Review Loop (max 3 rounds)

Codex reviews the uncommitted diff via MCP, returning a structured verdict:
- **BLOCKED** (CRITICAL issues) — Claude fixes, Codex re-reviews
- **WARNING** (HIGH issues) — Claude fixes, Codex re-reviews
- **MEDIUM/LOW issues** — surfaced to user; user decides whether to fix before delivery

Prompting technique: Structured verdict taxonomy with severity-based routing. The APPROVED/WARNING/BLOCKED/MEDIUM/LOW vocabulary creates a shared protocol between Claude (implementer) and Codex (reviewer). Each severity level maps to a specific action, making the review loop deterministic.


Prompt 3: execute-codex — Context Sanitization (verbatim)

Source: skills/execute-codex/SKILL.md

**Context Sanitization**: Never pass `.env`, secrets, tokens, API keys, or credentials to any external agent or MCP. Exclude files matching `.env*`, `*secret*`, `*credential*`, `*.pem`, `*.key`. Redact inline secrets before sending.

Prompting technique: Mandatory security constraint with explicit pattern list. This is the only framework in the batch with an explicit secret-scrubbing rule applied uniformly to all MCP calls.

09

Uniqueness

ching-kuo/claude-codex — Uniqueness

differs_from_seeds

No seed framework implements a bidirectional two-model critique loop. The closest seed is superpowers (skills with TDD enforcement, Claude Code target), but superpowers has no external agent. The closest pattern is the adversarial subagent in claude-flow, but claude-flow's adversarial agents are Claude-only, not a cross-model Claude↔Codex dispute. Compared to codex-plugin-cc (which has one-directional delegation: Claude forwards to Codex), ching-kuo/claude-codex implements true bidirectional interaction: Claude can dispute Codex's findings via mcp__codex__codex-reply (dispute replies don't count toward the iteration limit), and Codex can block Claude's implementation via the BLOCKED verdict. The APPROVED/WARNING/BLOCKED verdict taxonomy with severity-based routing is unique in the corpus.

Most Unusual Feature

The bidirectional critique loop in plan-codex: when Codex returns an audit finding, Claude is required to critically evaluate it before acting. If the finding seems incorrect, Claude calls mcp__codex__codex-reply to dispute it with technical reasoning — and that dispute call doesn't count toward the 3-iteration limit. This creates a peer-negotiation dynamic between two AI models.

Positioning

The most carefully engineered multi-model integration in this batch. The role assignments (Claude plans/implements, Codex audits/reviews/implements-large) are explicit, the context sanitization rules are strict, and the verdict vocabulary (APPROVED/WARNING/BLOCKED/MEDIUM/LOW) creates a shared protocol that could be adopted by other frameworks.

Observable Failure Modes

  1. MCP dependency fragility: mcp__codex__codex must be in the tool list or the skill halts immediately. Any MCP connection failure breaks all five skills.
  2. feature-dev plugin dependency: execute-codex and tdd-execute-codex depend on claude-plugins-official/feature-dev for the code-reviewer agent. If that plugin is unavailable or changes its API, these skills break.
  3. Iteration inflation: The dispute mechanism (replies don't count toward iterations) could extend audit loops indefinitely if Claude and Codex cannot agree. No hard cap on dispute rounds.
  4. Model cost: Each plan-codex run can invoke Codex 3+ times + Claude Opus for planning. The total token cost per feature is significantly higher than single-model frameworks.
  5. Dirty worktree blocking: TDD skills halt if the worktree has uncommitted changes, requiring the user to commit/stash before proceeding.
04

Workflow

ching-kuo/claude-codex — Workflow

Pattern Overview

Skill Claude Role Codex Role Max Iterations
plan-codex Planner (Opus) Plan Auditor (analyzer-role) 3 audit rounds
claude-codex Implementer Code Reviewer (reviewer-role) 3 review rounds
execute-codex Implementer (small) or router Implementer (large) + review 3 review rounds
tdd-claude-codex Test writer + Implementer Test Auditor + Code Reviewer 2+3 rounds
tdd-execute-codex Test writer Test Auditor + routing + review 2+3 rounds

plan-codex Workflow

Phase 0: Read plan file or use task description
Phase 1: Launch Task agent (subagent_type: "Plan") → structured implementation plan
         Save to .claude/plan/<feature-name>.md
Phase 2: Codex Audit Loop (max 3 iterations)
         Call mcp__codex__codex with codex-analyzer-role.md injected
         Parse APPROVED → done; issues → critically evaluate → address if correct → re-audit
         Discussion replies (calling codex-reply to dispute incorrect findings) do NOT count toward 3 iterations
Phase 3: Deliver approved plan to user

claude-codex Workflow

Phase 0: Read plan or use task description
Phase 1: Context retrieval
Phase 2: Claude implements (Edit/Write + self-verify)
Phase 3: Codex Review Loop (max 3 rounds)
         Call mcp__codex__codex with reviewer-role.md
         Verdict: APPROVED → deliver
                  BLOCKED (CRITICAL) → Claude fixes → re-review
                  WARNING (HIGH) → Claude fixes → re-review
                  MEDIUM/LOW → surfaced to user, user decides
Phase 4: Deliver

execute-codex Size Routing

Small change (ALL true: ≤2 files, ≤30 lines, no new logic):
  → Claude implements directly → Task(feature-dev:code-reviewer) reviews

Large change (any: 3+ files, 30+ lines, new logic):
  → Codex implements via mcp__codex__codex with architect-role.md
  → Task(feature-dev:code-reviewer) reviews
  → Claude fixes issues → re-review (max 3 rounds)

tdd-claude-codex TDD Workflow

Phase 1: Context + test baseline
Phase 2: RED — Task agent writes failing tests only (tdd-specialist-role.md)
Phase 3: Codex Test Audit (max 2 iterations)
         Review tests for quality/coverage before implementation begins
Phase 4: GREEN — Claude implements to make tests pass
Phase 5: IMPROVE — Refactor while tests stay green
Phase 6: Codex Implementation Review (max 3 iterations)

Approval Gates

Gate Type
Missing context confirmation freetext-clarify
Codex audit finding dispute freetext-clarify (via codex-reply)
MEDIUM/LOW review issues — fix or accept yes-no
Dirty worktree (TDD skills) typed-confirm
06

Memory Context

ching-kuo/claude-codex — Memory and Context

State Storage

  • Plan files: .claude/plan/<feature-name>.md — implementation plans saved by plan-codex, consumed by execute-codex and claude-codex
  • Codex thread IDs: threadId returned by mcp__codex__codex is retained within the skill execution to enable follow-up calls via mcp__codex__codex-reply
  • Test baseline: $START_SHA (git rev-parse HEAD) recorded at TDD skill start for diff scoping

Persistence

  • Project-scoped: .claude/plan/ for approved implementation plans — shared across Claude Code sessions
  • Session-scoped: Codex threadIds (only valid during one Claude Code session's MCP connection)

Cross-Session Handoff

Partial — approved plans in .claude/plan/ survive Claude Code session restarts and can be passed to execute-codex or claude-codex in a new session. Codex threadIds are session-scoped and cannot be resumed.

Compaction

Not handled. Skills are designed for per-feature execution, not long running sessions.

MCP Architecture and Context

Unlike codex-plugin-cc (which uses the Codex CLI directly), ching-kuo/claude-codex runs Codex via MCP server. The MCP connection (claude mcp add codex ... mcp-server) is established once at Claude Code startup and persists for the session. Each mcp__codex__codex call creates a new Codex thread; mcp__codex__codex-reply continues an existing thread.

Memory Type

File-based (.claude/plan/), project-scoped.

07

Orchestration

ching-kuo/claude-codex — Orchestration

Multi-Agent

Yes — bidirectional: Claude and Codex interact in critique loops.

Orchestration Pattern

Sequential with feedback loops. Claude and Codex alternate in bounded review cycles (max 2-3 iterations). Not hierarchical (neither model is definitively the orchestrator for all tasks) — execute-codex routes to either Claude or Codex depending on task size.

Isolation Mechanism

Process — Codex runs as MCP server (codex mcp-server), called via mcp__codex__codex tool. Claude's filesystem context is not shared with Codex directly; context must be explicitly passed in MCP call parameters.

Multi-Model

Yes — the most explicit multi-model assignment in this batch:

Skill Claude Role Model Codex Role Model
plan-codex Planner Opus (recommended) Plan Auditor GPT-5.3-codex or user-configured
claude-codex Implementer Sonnet (default) Code Reviewer Codex MCP
execute-codex Small-task implementer Sonnet Large-task implementer Codex MCP
tdd-claude-codex Test writer + Implementer Sonnet Test Auditor + Code Reviewer Codex MCP

Codex Roles

All four roles explicitly:

  • Auditor/Reviewer: plan-codex (plan audit), claude-codex (code review), tdd-* (test audit + code review)
  • Worker: execute-codex (large task implementation)

Execution Mode

Interactive-loop (user invokes each skill; Claude handles the iteration loop internally).

Consensus Mechanism

Bidirectional critique — Claude can push back on Codex findings via codex-reply without counting it as an iteration. This is not a consensus algorithm but a dispute resolution mechanism.

Prompt Chaining

Yes — plan-codex output (.claude/plan/<feature>.md) is input to execute-codex or claude-codex.

08

Ui Cli Surface

ching-kuo/claude-codex — UI and CLI Surface

Dedicated CLI Binary

No. Skills and commands are installed to ~/.claude/ and invoked via Claude Code's skill activation or slash-command interface.

Skill Invocation

Skills activate via trigger phrases in natural language:

  • /plan-codex <task or file> — via command or skill trigger
  • /execute-codex <task or file>
  • /claude-codex <task or file>
  • /tdd-claude-codex <task or file>
  • /tdd-execute-codex <task or file>

Commands in ~/.claude/commands/ support explicit model pinning (/model opus before invoking) and allowed-tools restrictions.

Local UI

None.

IDE Integration

Claude Code only (uses Claude Code's skill system, Task tool, and mcp__codex__* MCP tools). No Cursor, Codex CLI native, or Gemini CLI support.

MCP Server Setup

claude mcp add codex -s user -- codex -c model=gpt-5.3-codex -c model_reasoning_effort=high mcp-server

The -s user flag makes this a user-scoped MCP server available across all projects. Model and reasoning effort are configured at server startup, not per-call (though skills recommend $cc:review --model opus style overrides via /model command).

Observability

  • Codex MCP thread IDs surfaced in output
  • APPROVED/WARNING/BLOCKED verdict visible in Claude session
  • Eval scenarios in evals/evals.json per skill enable systematic testing

Cross-Tool Portability

Low — Claude Code only. Requires Claude Code + Codex MCP server + feature-dev plugin.

Related frameworks

same archetype · same primary tool · same memory type

BMAD-METHOD ★ 48k

Provides a full agile delivery lifecycle with named expert-persona AI collaborators that elicit the human's best thinking rather…

Agent OS ★ 4.6k

Extracts implicit codebase conventions into token-efficient markdown standards files and injects them selectively into AI agent…

Claude Conductor ★ 367

Gives Claude Code a persistent, cross-linked, auto-analyzed documentation system so it retains codebase context across sessions.

Spec-Driver (Greenfield Spec-Driven Development) ★ 25

Prevents spec rot in AI-assisted development by making implementation changes flow back into evergreen, authoritative specs via…

Anthropic Knowledge Work Plugins ★ 16k

Role-specialized plugin bundles with live MCP connectors that turn Claude into a domain expert for enterprise knowledge workers.

Codex Integration for Claude Code (skill-codex) ★ 1.3k

Single Claude Code skill that handles Codex CLI invocation correctly (stdin blocking, thinking token suppression, session resume)…