OpenAI Codex CLI

codex-cli · openai/codex · ★ 86k · last commit 2026-05-26

Primitive shape 19 total

Commands 7 Skills 12

Summary

Codex CLI — Summary

OpenAI's official CLI coding agent, distributed as a Rust binary with a full terminal UI, that runs LLMs (primarily OpenAI o-series and GPT-4 family) directly on your local machine. It ships its own sandboxing (macOS Seatbelt, Linux bwrap) and an app-server daemon that powers both the TUI and optional desktop app. Unlike pure prompt frameworks, Codex CLI is a complete agent runtime: it manages conversation history, tool calling, MCP client connections, plugin/skill/environment definition, and secure subprocess isolation in one artifact. The repo contains two major artifacts: codex-rs (the production Rust runtime with TUI, sandboxing, and daemon) and codex-cli (an older Node.js/TypeScript wrapper retained for compatibility). Skills in .codex/skills/ are lightweight Markdown files with YAML front-matter that the agent reads at runtime; they are distinct from commands but can orchestrate subagents via parallel Task spawns. An environment definition (environment.toml) specifies the setup script and run commands for sandbox execution.

Compared to the 11 seed frameworks: unlike superpowers (skills-only Claude Code plugin), Codex CLI is a fully-self-contained binary agent with its own sandboxed runtime, multi-model support, and app-server daemon — closer in architecture to claude-flow (runtime + MCP) but built in Rust, owned by an LLM vendor, and able to run entirely offline with BYOK. It is the primary comparison baseline for this batch.

Overview

Codex CLI — Overview

Origin

Launched by OpenAI in 2025 as the open-source version of the cloud Codex agent. The GitHub repo (openai/codex) is the canonical home; it supersedes the earlier TypeScript CLI. The project is dual-artifact: codex-rs contains the production Rust implementation; codex-cli contains the older TypeScript layer kept for npm compatibility.

Philosophy

Build the simplest possible agentic loop that is maximally safe. The AGENTS.md and codebase conventions emphasize:

Sandbox by default — every shell invocation runs inside macOS Seatbelt or Linux bubblewrap; network can be disabled.
Resist adding to codex-core — keep the shared crate lean; push new functionality into feature-specific crates.
Module size discipline — target under 500 LoC per Rust module; split before growing.
Exhaustive match patterns — avoid wildcard arms to prevent unhandled cases.

Manifesto-style quotes (from AGENTS.md)

"Never add or modify any code related to CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR or CODEX_SANDBOX_ENV_VAR."

"Resist adding code to codex-core! Particularly when introducing a new concept/feature/API, before adding to codex-core, consider whether there is an existing crate other than codex-core that is an appropriate place for your new code to live."

"Target Rust modules under 500 LoC, excluding tests. If a file exceeds roughly 800 LoC, add new functionality in a new module instead of extending the existing file."

Distribution modes

curl -fsSL https://chatgpt.com/codex/install.sh | sh (first-party installer)
npm install -g @openai/codex (npm package wrapping the Rust binary)
brew install --cask codex (Homebrew Cask)
Pre-built binaries from GitHub Releases for all major platforms

Target users

Developers who want a fully-local, vendor-provided, sandboxed coding agent that works offline and integrates with ChatGPT plans.

Architecture

Codex CLI — Architecture

Distribution & Install

Primary: Pre-built Rust binary distributed via first-party install.sh, Homebrew Cask, GitHub Releases
npm wrapper: npm install -g @openai/codex (wraps the Rust binary)
Build from source: Bazel monorepo (just task runner; cargo for Rust; pnpm for TypeScript parts)

Required Runtime

The Rust binary is self-contained (no Node runtime needed for the Rust version)
macOS Seatbelt (/usr/bin/sandbox-exec) or Linux bubblewrap (bwrap) for sandboxing
git optional (for worktree / PR creation)
MCP servers: any MCP-compatible binary

Directory Tree (repo root)

openai/codex/
├── codex-rs/                  # Production Rust runtime
│   ├── cli/                   # Binary entry point; subcommands: app, doctor, login, mcp, plugin, remote-control
│   ├── core/                  # Agent loop, tool calling, model client (capped at ~500 LOC/module)
│   ├── tui/                   # Ratatui-based terminal UI
│   ├── app-server/            # HTTP daemon (desktop app + remote use)
│   ├── app-server-daemon/     # Daemon launcher
│   ├── apply-patch/           # Patch application crate
│   ├── bwrap/                 # Linux bubblewrap sandbox crate
│   ├── codex-mcp/             # MCP connection manager
│   └── ...                    # 20+ specialized crates
├── codex-cli/                 # Legacy TypeScript CLI (npm compat)
├── .codex/
│   ├── skills/                # Per-skill directories, each with SKILL.md
│   │   ├── code-review/
│   │   ├── codex-bug/
│   │   ├── code-review-breaking-changes/
│   │   ├── code-review-change-size/
│   │   ├── code-review-context/
│   │   ├── code-review-testing/
│   │   ├── codex-issue-digest/
│   │   ├── codex-pr-body/
│   │   ├── remote-tests/
│   │   ├── test-tui/
│   │   ├── update-v8-version/
│   │   └── babysit-pr/
│   └── environments/
│       └── environment.toml   # Setup script + run actions
├── docs/                      # Contributing, install guides
├── scripts/                   # Repo maintenance scripts
└── sdk/                       # Programmatic SDK

Target AI Tools

OpenAI models (o-series reasoning, GPT-4 family) via API or ChatGPT plan
Any OpenAI-compatible endpoint (BYOK)
Model selection via --model flag or config

Sandboxing Architecture

Platform	Mechanism	Network disable
macOS	`/usr/bin/sandbox-exec` (Seatbelt)	`CODEX_SANDBOX_NETWORK_DISABLED=1`
Linux	bubblewrap (`bwrap`)	same env var
Other	process isolation only	—

Components

Codex CLI — Components

Skills (`.codex/skills/` — 12 directories, each with `SKILL.md`)

Skill	Purpose
`code-review`	Orchestrator: spawns parallel subagents (one per code-review-* skill), collects all findings
`code-review-breaking-changes`	Reviews PR for breaking API/behavior changes
`code-review-change-size`	Reviews PR change volume and scope
`code-review-context`	Reviews PR for contextual correctness
`code-review-testing`	Reviews PR for test coverage
`codex-bug`	Diagnoses GitHub bug reports in openai/codex; decides next action
`codex-issue-digest`	Summarizes open GitHub issues
`codex-pr-body`	Generates PR body from diff
`remote-tests`	Triggers and monitors remote CI test runs
`test-tui`	Tests TUI behavior
`update-v8-version`	Automates V8 version bump process
`babysit-pr`	Monitors a PR and responds to review comments

Skill format: Markdown with YAML front-matter (name, description), freeform instructions body. The code-review skill explicitly uses subagents.

CLI Subcommands (via `codex-rs/cli/src/`)

Subcommand	Purpose
`codex` (default/interactive)	Start interactive TUI session
`codex app`	Launch/attach to desktop app experience
`codex doctor`	Diagnose configuration and environment
`codex login`	Authenticate with ChatGPT or API key
`codex mcp`	MCP server management commands
`codex plugin`	Plugin management
`codex remote-control`	Remote control/RPC mode

Environments (`.codex/environments/`)

File	Purpose
`environment.toml`	Defines `name`, `setup.script`, and `[[actions]]` (name + command + icon) for sandbox runs

Core Rust Crates (in `codex-rs/`)

Crate	Purpose
`codex-core`	Agent loop, model client, context management
`codex-tui`	Ratatui terminal UI
`app-server`	HTTP daemon for desktop app and remote use
`codex-mcp`	MCP connection manager (handles multiple MCP servers)
`apply-patch`	Applies unified diffs to the filesystem
`bwrap`	Linux bubblewrap sandbox integration
`cli`	Binary entry point and subcommand routing
`agent-identity`	Agent identity/credential management
`analytics`	Usage analytics (opt-in)
`backend-client`	LLM backend HTTP client

Prompts

Codex CLI — Prompts

Prompt File Format

Skills use Markdown files with YAML front-matter (name, description) followed by freeform instruction text. The YAML front-matter is machine-readable; the body is human/agent-readable.

Verbatim Excerpt 1: `code-review` skill (orchestrator pattern)

File: .codex/skills/code-review/SKILL.md

---
name: code-review
description: Run a final code review on a pull request
---

Use subagents to review code using all code-review-* skills in this repository other than this orchestrator. One subagent per skill. Pass full skill path to subagents. Use xhigh reasoning.

You must return every single issue from every subagent. You can return an unlimited number of findings.
Use raw Markdown to report findings.
Number findings for ease of reference.
Each finding must include a specific file path and line number.

If the GitHub user running the review is the owner of the pull request add a `code-reviewed` label.
Do not leave GitHub comments unless explicitly asked.

Prompting technique: Hierarchical task decomposition — the orchestrator delegates to specialized sub-skills, enforces output format (numbered findings with file+line), controls side-effects (label vs. no comments), and sets reasoning intensity (xhigh reasoning).

Verbatim Excerpt 2: `codex-bug` skill (decision-tree diagnostic)

File: .codex/skills/codex-bug/SKILL.md (partial)

---
name: codex-bug
description: Diagnose GitHub bug reports in openai/codex. Use when given a GitHub issue URL from openai/codex and asked to decide next steps such as verifying against the repo, requesting more info, or explaining why it is not a bug; follow any additional user-provided instructions.
---

# Codex Bug

## Overview
Diagnose a Codex GitHub bug report and decide the next action: verify against sources, request more info, or explain why it is not a bug.

## Workflow

1. Confirm the input
- Require a GitHub issue URL that points to `github.com/openai/codex/issues/…`.
- If the URL is missing or not in the right repo, ask the user for the correct link.

2. Network access
- Always access the issue over the network immediately, even if you think access is blocked or unavailable.
- Prefer the GitHub API over HTML pages because the HTML is noisy.

3. Read the issue
...

5. Decide the course of action
- **Verify with sources** when the report is specific and likely reproducible.
- **Request more information** when the report is vague, missing repro steps, or lacks logs/environment.
- **Explain not a bug** when the report contradicts current behavior or documented constraints.

Prompting technique: Explicit decision-tree with enumerated branches and evidence requirements. Forces the agent to classify and cite before acting. Includes guard rails (URL validation, network-first stance).

Environment Definition

File: .codex/environments/environment.toml

# THIS IS AUTOGENERATED. DO NOT EDIT MANUALLY
version = 1
name = "codex"

[setup]
script = ""

[[actions]]
name = "Run"
icon = "run"
command = "cargo +1.93.0 run --manifest-path=codex-rs/Cargo.toml --bin codex -- -c mcp_oauth_credentials_store=file"

This is the environment definition telling the sandbox how to build and run the project itself, used in dev workflows.

Uniqueness

Codex CLI — Uniqueness & Positioning

Differs from Seeds

Codex CLI is most architecturally similar to claude-flow (both are large runtime systems with MCP client capabilities and multi-agent support), but differs in three fundamental ways: (1) it is a self-contained compiled binary (Rust, not npm) with no Node.js runtime dependency; (2) it ships first-party sandboxing (macOS Seatbelt + Linux bubblewrap) baked into the binary itself, making it the only tool in the batch with kernel-level process isolation; and (3) it is vendor-owned (OpenAI) and designed to work with ChatGPT plan credentials, not just BYOK. Unlike superpowers (Claude Code plugin, skills-only), Codex CLI is a fully independent agent harness that does not depend on Claude Code or any other IDE. Unlike taskmaster-ai (MCP-server architecture), Codex CLI is an MCP client that can connect to external servers rather than itself being a server.

Positioning

The reference CLI agent from OpenAI — the "official answer" to the question of what a locally-run OpenAI coding agent looks like. Compared to the web-based Codex (chatgpt.com/codex), this is the terminal-first, hackable, BYOK version.

Key Differentiators

First-party vendor product — officially maintained by OpenAI; aligned with ChatGPT plan credits
Compiled Rust binary with sandboxing — no runtime dependency; kernel-level process isolation
Parallel subagent orchestration in skills — code-review skill demonstrates real fan-out with parallel Task spawns
App-server daemon — background HTTP daemon bridges CLI ↔ desktop app ↔ remote control
Environment.toml — declarative sandbox setup (script + actions) for reproducible execution contexts
Skill + environment separation — skills define what to do; environments define how to run

Observable Failure Modes

Sandboxing friction: Seatbelt/bwrap can block legitimate file/network operations; CODEX_SANDBOX=seatbelt env var affects test behavior
codex-core bloat risk: The AGENTS.md explicitly warns contributors about this; the crate has grown large
OpenAI-only model focus: BYOK works but the UX is optimized for OpenAI models; other providers are second-class
Skill activation is manual: unlike superpowers/spec-driver, skills require explicit invocation (no autonomous trigger)
Context compaction undocumented: how the agent handles very long sessions is not described in public docs

Workflow

Codex CLI — Workflow

Execution Modes

Mode	Trigger	Description
Interactive TUI	`codex`	Full Ratatui-based terminal UI; user types prompts, sees diffs, approves tool calls
Non-interactive / scripted	`--print` flag	Single-shot output, suitable for CI piping
App mode	`codex app`	Connects to the local app-server daemon; used by the desktop app
Remote control	`codex remote-control`	RPC interface for external automation
Skill invocation	`/skill-name` in chat or via CLI	Agent loads and executes the SKILL.md instructions

Typical Interactive Workflow

Phase	User action	Artifact
Auth	`codex login` or set `OPENAI_API_KEY`	Saved credential
Task input	Type natural-language prompt in TUI	Conversation message
Planning	Agent reasons, may ask clarifying questions	Chain-of-thought in TUI
Tool use	Agent proposes shell commands / file edits	Diff preview in TUI
Approval gate	User confirms, edits, or rejects each tool call	Approved action
Execution	Agent runs approved commands in sandbox	Output in TUI
Iteration	Agent re-prompts based on output; loop continues	Updated context
Session save	Trajectory JSON saved automatically	`.codex/trajectory-*.json` (if configured)

Approval Gates

Tool call confirmation — every shell command or file write displayed with [Y/n/edit] prompt in TUI
Login confirmation — OAuth flow or API key entry
Skill invocation — user explicitly names skill; no implicit auto-invocation

Sandbox Lifecycle

Agent constructs tool call (shell command, file edit, patch)
codex-core routes to sandbox runner (Seatbelt on macOS, bwrap on Linux)
Command executes with optional network isolation (CODEX_SANDBOX_NETWORK_DISABLED=1)
Output returned to agent for next reasoning step

Skill Orchestration Flow (example: `code-review`)

User invokes code-review skill
Orchestrator skill spawns N parallel subagents (one per code-review-* skill)
Each subagent returns findings
Orchestrator collates all findings, returns numbered list with file paths + line numbers
Optionally adds code-reviewed GitHub label

Memory Context

Codex CLI — Memory & Context

Session State

Conversation history: held in-process in codex-core; linear message list passed to model on each step
Trajectory files: serialized JSON trajectory saved to disk when output_path is configured; captures all messages, tool calls, and results
Config file: ~/.codex/config.toml (or project-level .codex/config.toml) stores model selection, sandbox settings, MCP server definitions

Context Files

AGENTS.md: project-level instructions read automatically by the agent at session start (similar to CLAUDE.md); provides coding conventions, crate rules, testing procedures
Skills: loaded on invocation; injected into the system prompt or handed as a user message when the skill is called

MCP-Bridged Memory

Codex CLI is an MCP client — it can connect to any MCP server including memory/knowledge-graph servers
The codex mcp subcommand manages configured server connections
codex-mcp crate handles MCP connection manager and tool mutation

Cross-Session Handoff

~/.codex/ directory persists auth credentials and config
No built-in conversation replay/resume by default; trajectory files provide audit capability
crash_recovery: not documented as automatic; trajectory saves on each step provide partial recovery

Context Compaction

Not explicitly documented in the repo; the agent loop in codex-core handles context window limits by truncation or summary depending on model config
CODEX_SANDBOX_NETWORK_DISABLED env var signals to the agent that network tests should be skipped (not a compaction mechanism, but a form of environment-aware context trimming)

Orchestration

Codex CLI — Orchestration

Multi-Agent Support

Yes — explicitly supported via the code-review skill pattern: the orchestrator skill spawns one subagent per code-review-* skill, running them in parallel. The SKILL.md instructs: "Use subagents to review code … One subagent per skill."

Orchestration Pattern

Parallel fan-out (for skill orchestration) + sequential interactive loop (for normal agentic use).

The code-review orchestrator is the clearest example: parallel subagent spawns, then result aggregation. Normal task execution is sequential (step → approve → execute → next step).

Isolation Mechanism

Platform	Mechanism
macOS	Apple Seatbelt (`/usr/bin/sandbox-exec`); every shell command runs in a sandboxed subprocess
Linux	bubblewrap (`bwrap`); filesystem namespacing + optional network disable
All	`CODEX_SANDBOX_NETWORK_DISABLED=1` disables network in sandboxed processes

This is the most sophisticated isolation mechanism in the batch — process-level sandboxing baked into the binary, not a wrapper.

Execution Mode

Interactive loop (primary) + one-shot (via --print) + background daemon (app-server for desktop integration)

Multi-Model

Single-model by default (OpenAI models only in the primary implementation)
BYOK supports any OpenAI-compatible endpoint
No documented per-role model routing (planning vs. execution)
Reasoning intensity can be set per-skill (xhigh reasoning in code-review)

Consensus Mechanism

None. Subagent results are aggregated by the orchestrator, not voted on.

Prompt Chaining

Yes — the code-review orchestrator passes each sub-skill's findings as inputs to its final aggregation step. The orchestrator prompt's output IS the product of chaining subagent outputs.

Max Concurrent Agents

Unknown (not documented); determined by the agent's parallel Task spawn count in the skill.

Ui Cli Surface

Codex CLI — UI & CLI Surface

Dedicated CLI Binary

Binary name: codex
Not a thin wrapper: self-contained Rust binary with full agent runtime, sandboxing, and TUI
Subcommands: codex (interactive), codex app, codex doctor, codex login, codex mcp, codex plugin, codex remote-control
Install: curl -fsSL https://chatgpt.com/codex/install.sh | sh or npm install -g @openai/codex or Homebrew

Terminal UI (TUI)

Type: terminal-tui
Stack: Ratatui (Rust TUI library) in codex-rs/tui/
Features:
- Interactive conversation pane with diff previews
- Tool call approval flow ([Y/n/edit] per action)
- Chat composer with multi-line input
- Status footer showing model, cost, sandbox state
- Syntax-highlighted code and patch display

Desktop App

codex app connects to the local app-server HTTP daemon
The daemon (app-server-daemon) runs as a background process
The desktop experience (ChatGPT app or standalone) communicates with this daemon
Port: unknown (configured in app-server)

IDE Integration

VS Code, Cursor, and Windsurf via a separate Codex IDE extension (not in this repo)
Referenced from README: "install in your IDE" at developers.openai.com/codex/ide

Observability

AGENTS.md documents debug procedures and test commands
codex doctor subcommand for diagnosing configuration
Trajectory JSON files capture full session history when configured
analytics crate provides optional usage analytics (opt-in)

Remote Control

codex remote-control exposes an RPC interface
app-server-transport crate handles HTTP + SSE communication with the daemon

Related frameworks

same archetype · same primary tool · same memory type

Spec Kit ★ 106k

A2 Mirror cmd+skill

Turns a natural-language feature description into a complete, versioned, AI-executable specification pipeline installable for 30+…

OpenSpec ★ 51k

A2 Mirror cmd+skill

Adds a lightweight spec layer so AI coding assistants and humans agree on what to build before any code is written.

ECC (Everything Claude Code) ★ 193k

A2 Mirror cmd+skill

Comprehensive harness-native operator system: 246 skills + 61 agents + continuous learning hooks + multi-model routing across 8…

Gemini CLI (Google) ★ 105k

A2 Mirror cmd+skill

Bring the full power of Gemini into the terminal with a free tier, Google Search grounding, and extensible MCP support.

cursorrules v5 (kinopeee) ★ 1.1k

A2 Mirror cmd+skill

Bilingual (ja/en) Cursor rule set with tricolor task classification, security-first prompt injection defense, and structured git…

windsurfrules v5 (kinopeee) ★ 364

A2 Mirror cmd+skill

Windsurf/Antigravity port of cursorrules v5 — same tricolor task classification and injection defense, translated to Windsurf's…

Distribution

Type: cli-tool
License: Apache-2.0
Install: one-liner
Version: main (2026-05-26)

Surfaces

CLI binary: codex
CLI subcmds: 7
Local UI: terminal-tui
Tech stack: Ratatui (Rust TUI library)

Components

Commands: 7
Skills: 12
Subagents: 0
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 3
Templates: 1

Workflow

Phases: 7
Approval gates: 1
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: Yes
Pattern: parallel-fan-out
Isolation: sandbox-api
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: No
BYOK: Yes
Locked to: null (OpenAI models; BYOK for compatible endpoints)
Modal: text+vision

Execution

Mode: interactive-loop
Crash recovery: No
Session handoff: No
Streaming: Yes

Memory

Type: file-based
Persistence: session
Search: none
State files: 2 files

Quality

TDD: No
TDD mechanism: none
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: Yes
Audit format: jsonl
Replay: Yes

Tools

Primary: codex-cli
Targets: 3
Portability: low

Signals

Stars: 86k
Last commit: 2026-05-26
Maintainer: active
Quality score: 3.7/10

Summary

Codex CLI — Summary

Overview

Codex CLI — Overview

Origin

Philosophy

Manifesto-style quotes (from AGENTS.md)

Distribution modes

Target users

Architecture

Codex CLI — Architecture

Distribution & Install

Required Runtime

Directory Tree (repo root)

Target AI Tools

Sandboxing Architecture

Components

Codex CLI — Components

Skills (.codex/skills/ — 12 directories, each with SKILL.md)

CLI Subcommands (via codex-rs/cli/src/)

Environments (.codex/environments/)

Core Rust Crates (in codex-rs/)

Prompts

Codex CLI — Prompts

Prompt File Format

Verbatim Excerpt 1: code-review skill (orchestrator pattern)

Verbatim Excerpt 2: codex-bug skill (decision-tree diagnostic)

Environment Definition

Uniqueness

Codex CLI — Uniqueness & Positioning

Differs from Seeds

Positioning

Key Differentiators

Observable Failure Modes

Workflow

Codex CLI — Workflow

Execution Modes

Typical Interactive Workflow

Approval Gates

Sandbox Lifecycle

Skill Orchestration Flow (example: code-review)

Memory Context

Codex CLI — Memory & Context

Session State

Context Files

MCP-Bridged Memory

Cross-Session Handoff

Context Compaction

Orchestration

Codex CLI — Orchestration

Multi-Agent Support

Orchestration Pattern

Isolation Mechanism

Execution Mode

Multi-Model

Consensus Mechanism

Prompt Chaining

Max Concurrent Agents

Ui Cli Surface

Codex CLI — UI & CLI Surface

Dedicated CLI Binary

Terminal UI (TUI)

Desktop App

IDE Integration

Observability

Remote Control

Related frameworks

Skills (`.codex/skills/` — 12 directories, each with `SKILL.md`)

CLI Subcommands (via `codex-rs/cli/src/`)

Environments (`.codex/environments/`)

Core Rust Crates (in `codex-rs/`)

Verbatim Excerpt 1: `code-review` skill (orchestrator pattern)

Verbatim Excerpt 2: `codex-bug` skill (decision-tree diagnostic)

Skill Orchestration Flow (example: `code-review`)