Skip to content
/

OpenAI Codex CLI

codex-cli · openai/codex · ★ 86k · last commit 2026-05-26

Primitive shape 19 total
Commands 7 Skills 12
00

Summary

Codex CLI — Summary

OpenAI's official CLI coding agent, distributed as a Rust binary with a full terminal UI, that runs LLMs (primarily OpenAI o-series and GPT-4 family) directly on your local machine. It ships its own sandboxing (macOS Seatbelt, Linux bwrap) and an app-server daemon that powers both the TUI and optional desktop app. Unlike pure prompt frameworks, Codex CLI is a complete agent runtime: it manages conversation history, tool calling, MCP client connections, plugin/skill/environment definition, and secure subprocess isolation in one artifact. The repo contains two major artifacts: codex-rs (the production Rust runtime with TUI, sandboxing, and daemon) and codex-cli (an older Node.js/TypeScript wrapper retained for compatibility). Skills in .codex/skills/ are lightweight Markdown files with YAML front-matter that the agent reads at runtime; they are distinct from commands but can orchestrate subagents via parallel Task spawns. An environment definition (environment.toml) specifies the setup script and run commands for sandbox execution.

Compared to the 11 seed frameworks: unlike superpowers (skills-only Claude Code plugin), Codex CLI is a fully-self-contained binary agent with its own sandboxed runtime, multi-model support, and app-server daemon — closer in architecture to claude-flow (runtime + MCP) but built in Rust, owned by an LLM vendor, and able to run entirely offline with BYOK. It is the primary comparison baseline for this batch.

01

Overview

Codex CLI — Overview

Origin

Launched by OpenAI in 2025 as the open-source version of the cloud Codex agent. The GitHub repo (openai/codex) is the canonical home; it supersedes the earlier TypeScript CLI. The project is dual-artifact: codex-rs contains the production Rust implementation; codex-cli contains the older TypeScript layer kept for npm compatibility.

Philosophy

Build the simplest possible agentic loop that is maximally safe. The AGENTS.md and codebase conventions emphasize:

  • Sandbox by default — every shell invocation runs inside macOS Seatbelt or Linux bubblewrap; network can be disabled.
  • Resist adding to codex-core — keep the shared crate lean; push new functionality into feature-specific crates.
  • Module size discipline — target under 500 LoC per Rust module; split before growing.
  • Exhaustive match patterns — avoid wildcard arms to prevent unhandled cases.

Manifesto-style quotes (from AGENTS.md)

"Never add or modify any code related to CODEX_SANDBOX_NETWORK_DISABLED_ENV_VAR or CODEX_SANDBOX_ENV_VAR."

"Resist adding code to codex-core! Particularly when introducing a new concept/feature/API, before adding to codex-core, consider whether there is an existing crate other than codex-core that is an appropriate place for your new code to live."

"Target Rust modules under 500 LoC, excluding tests. If a file exceeds roughly 800 LoC, add new functionality in a new module instead of extending the existing file."

Distribution modes

  1. curl -fsSL https://chatgpt.com/codex/install.sh | sh (first-party installer)
  2. npm install -g @openai/codex (npm package wrapping the Rust binary)
  3. brew install --cask codex (Homebrew Cask)
  4. Pre-built binaries from GitHub Releases for all major platforms

Target users

Developers who want a fully-local, vendor-provided, sandboxed coding agent that works offline and integrates with ChatGPT plans.

02

Architecture

Codex CLI — Architecture

Distribution & Install

  • Primary: Pre-built Rust binary distributed via first-party install.sh, Homebrew Cask, GitHub Releases
  • npm wrapper: npm install -g @openai/codex (wraps the Rust binary)
  • Build from source: Bazel monorepo (just task runner; cargo for Rust; pnpm for TypeScript parts)

Required Runtime

  • The Rust binary is self-contained (no Node runtime needed for the Rust version)
  • macOS Seatbelt (/usr/bin/sandbox-exec) or Linux bubblewrap (bwrap) for sandboxing
  • git optional (for worktree / PR creation)
  • MCP servers: any MCP-compatible binary

Directory Tree (repo root)

openai/codex/
├── codex-rs/                  # Production Rust runtime
│   ├── cli/                   # Binary entry point; subcommands: app, doctor, login, mcp, plugin, remote-control
│   ├── core/                  # Agent loop, tool calling, model client (capped at ~500 LOC/module)
│   ├── tui/                   # Ratatui-based terminal UI
│   ├── app-server/            # HTTP daemon (desktop app + remote use)
│   ├── app-server-daemon/     # Daemon launcher
│   ├── apply-patch/           # Patch application crate
│   ├── bwrap/                 # Linux bubblewrap sandbox crate
│   ├── codex-mcp/             # MCP connection manager
│   └── ...                    # 20+ specialized crates
├── codex-cli/                 # Legacy TypeScript CLI (npm compat)
├── .codex/
│   ├── skills/                # Per-skill directories, each with SKILL.md
│   │   ├── code-review/
│   │   ├── codex-bug/
│   │   ├── code-review-breaking-changes/
│   │   ├── code-review-change-size/
│   │   ├── code-review-context/
│   │   ├── code-review-testing/
│   │   ├── codex-issue-digest/
│   │   ├── codex-pr-body/
│   │   ├── remote-tests/
│   │   ├── test-tui/
│   │   ├── update-v8-version/
│   │   └── babysit-pr/
│   └── environments/
│       └── environment.toml   # Setup script + run actions
├── docs/                      # Contributing, install guides
├── scripts/                   # Repo maintenance scripts
└── sdk/                       # Programmatic SDK

Target AI Tools

  • OpenAI models (o-series reasoning, GPT-4 family) via API or ChatGPT plan
  • Any OpenAI-compatible endpoint (BYOK)
  • Model selection via --model flag or config

Sandboxing Architecture

Platform Mechanism Network disable
macOS /usr/bin/sandbox-exec (Seatbelt) CODEX_SANDBOX_NETWORK_DISABLED=1
Linux bubblewrap (bwrap) same env var
Other process isolation only
03

Components

Codex CLI — Components

Skills (.codex/skills/ — 12 directories, each with SKILL.md)

Skill Purpose
code-review Orchestrator: spawns parallel subagents (one per code-review-* skill), collects all findings
code-review-breaking-changes Reviews PR for breaking API/behavior changes
code-review-change-size Reviews PR change volume and scope
code-review-context Reviews PR for contextual correctness
code-review-testing Reviews PR for test coverage
codex-bug Diagnoses GitHub bug reports in openai/codex; decides next action
codex-issue-digest Summarizes open GitHub issues
codex-pr-body Generates PR body from diff
remote-tests Triggers and monitors remote CI test runs
test-tui Tests TUI behavior
update-v8-version Automates V8 version bump process
babysit-pr Monitors a PR and responds to review comments

Skill format: Markdown with YAML front-matter (name, description), freeform instructions body. The code-review skill explicitly uses subagents.

CLI Subcommands (via codex-rs/cli/src/)

Subcommand Purpose
codex (default/interactive) Start interactive TUI session
codex app Launch/attach to desktop app experience
codex doctor Diagnose configuration and environment
codex login Authenticate with ChatGPT or API key
codex mcp MCP server management commands
codex plugin Plugin management
codex remote-control Remote control/RPC mode

Environments (.codex/environments/)

File Purpose
environment.toml Defines name, setup.script, and [[actions]] (name + command + icon) for sandbox runs

Core Rust Crates (in codex-rs/)

Crate Purpose
codex-core Agent loop, model client, context management
codex-tui Ratatui terminal UI
app-server HTTP daemon for desktop app and remote use
codex-mcp MCP connection manager (handles multiple MCP servers)
apply-patch Applies unified diffs to the filesystem
bwrap Linux bubblewrap sandbox integration
cli Binary entry point and subcommand routing
agent-identity Agent identity/credential management
analytics Usage analytics (opt-in)
backend-client LLM backend HTTP client
05

Prompts

Codex CLI — Prompts

Prompt File Format

Skills use Markdown files with YAML front-matter (name, description) followed by freeform instruction text. The YAML front-matter is machine-readable; the body is human/agent-readable.

Verbatim Excerpt 1: code-review skill (orchestrator pattern)

File: .codex/skills/code-review/SKILL.md

---
name: code-review
description: Run a final code review on a pull request
---

Use subagents to review code using all code-review-* skills in this repository other than this orchestrator. One subagent per skill. Pass full skill path to subagents. Use xhigh reasoning.

You must return every single issue from every subagent. You can return an unlimited number of findings.
Use raw Markdown to report findings.
Number findings for ease of reference.
Each finding must include a specific file path and line number.

If the GitHub user running the review is the owner of the pull request add a `code-reviewed` label.
Do not leave GitHub comments unless explicitly asked.

Prompting technique: Hierarchical task decomposition — the orchestrator delegates to specialized sub-skills, enforces output format (numbered findings with file+line), controls side-effects (label vs. no comments), and sets reasoning intensity (xhigh reasoning).

Verbatim Excerpt 2: codex-bug skill (decision-tree diagnostic)

File: .codex/skills/codex-bug/SKILL.md (partial)

---
name: codex-bug
description: Diagnose GitHub bug reports in openai/codex. Use when given a GitHub issue URL from openai/codex and asked to decide next steps such as verifying against the repo, requesting more info, or explaining why it is not a bug; follow any additional user-provided instructions.
---

# Codex Bug

## Overview
Diagnose a Codex GitHub bug report and decide the next action: verify against sources, request more info, or explain why it is not a bug.

## Workflow

1. Confirm the input
- Require a GitHub issue URL that points to `github.com/openai/codex/issues/…`.
- If the URL is missing or not in the right repo, ask the user for the correct link.

2. Network access
- Always access the issue over the network immediately, even if you think access is blocked or unavailable.
- Prefer the GitHub API over HTML pages because the HTML is noisy.

3. Read the issue
...

5. Decide the course of action
- **Verify with sources** when the report is specific and likely reproducible.
- **Request more information** when the report is vague, missing repro steps, or lacks logs/environment.
- **Explain not a bug** when the report contradicts current behavior or documented constraints.

Prompting technique: Explicit decision-tree with enumerated branches and evidence requirements. Forces the agent to classify and cite before acting. Includes guard rails (URL validation, network-first stance).

Environment Definition

File: .codex/environments/environment.toml

# THIS IS AUTOGENERATED. DO NOT EDIT MANUALLY
version = 1
name = "codex"

[setup]
script = ""

[[actions]]
name = "Run"
icon = "run"
command = "cargo +1.93.0 run --manifest-path=codex-rs/Cargo.toml --bin codex -- -c mcp_oauth_credentials_store=file"

This is the environment definition telling the sandbox how to build and run the project itself, used in dev workflows.

09

Uniqueness

Codex CLI — Uniqueness & Positioning

Differs from Seeds

Codex CLI is most architecturally similar to claude-flow (both are large runtime systems with MCP client capabilities and multi-agent support), but differs in three fundamental ways: (1) it is a self-contained compiled binary (Rust, not npm) with no Node.js runtime dependency; (2) it ships first-party sandboxing (macOS Seatbelt + Linux bubblewrap) baked into the binary itself, making it the only tool in the batch with kernel-level process isolation; and (3) it is vendor-owned (OpenAI) and designed to work with ChatGPT plan credentials, not just BYOK. Unlike superpowers (Claude Code plugin, skills-only), Codex CLI is a fully independent agent harness that does not depend on Claude Code or any other IDE. Unlike taskmaster-ai (MCP-server architecture), Codex CLI is an MCP client that can connect to external servers rather than itself being a server.

Positioning

The reference CLI agent from OpenAI — the "official answer" to the question of what a locally-run OpenAI coding agent looks like. Compared to the web-based Codex (chatgpt.com/codex), this is the terminal-first, hackable, BYOK version.

Key Differentiators

  1. First-party vendor product — officially maintained by OpenAI; aligned with ChatGPT plan credits
  2. Compiled Rust binary with sandboxing — no runtime dependency; kernel-level process isolation
  3. Parallel subagent orchestration in skillscode-review skill demonstrates real fan-out with parallel Task spawns
  4. App-server daemon — background HTTP daemon bridges CLI ↔ desktop app ↔ remote control
  5. Environment.toml — declarative sandbox setup (script + actions) for reproducible execution contexts
  6. Skill + environment separation — skills define what to do; environments define how to run

Observable Failure Modes

  • Sandboxing friction: Seatbelt/bwrap can block legitimate file/network operations; CODEX_SANDBOX=seatbelt env var affects test behavior
  • codex-core bloat risk: The AGENTS.md explicitly warns contributors about this; the crate has grown large
  • OpenAI-only model focus: BYOK works but the UX is optimized for OpenAI models; other providers are second-class
  • Skill activation is manual: unlike superpowers/spec-driver, skills require explicit invocation (no autonomous trigger)
  • Context compaction undocumented: how the agent handles very long sessions is not described in public docs
04

Workflow

Codex CLI — Workflow

Execution Modes

Mode Trigger Description
Interactive TUI codex Full Ratatui-based terminal UI; user types prompts, sees diffs, approves tool calls
Non-interactive / scripted --print flag Single-shot output, suitable for CI piping
App mode codex app Connects to the local app-server daemon; used by the desktop app
Remote control codex remote-control RPC interface for external automation
Skill invocation /skill-name in chat or via CLI Agent loads and executes the SKILL.md instructions

Typical Interactive Workflow

Phase User action Artifact
Auth codex login or set OPENAI_API_KEY Saved credential
Task input Type natural-language prompt in TUI Conversation message
Planning Agent reasons, may ask clarifying questions Chain-of-thought in TUI
Tool use Agent proposes shell commands / file edits Diff preview in TUI
Approval gate User confirms, edits, or rejects each tool call Approved action
Execution Agent runs approved commands in sandbox Output in TUI
Iteration Agent re-prompts based on output; loop continues Updated context
Session save Trajectory JSON saved automatically .codex/trajectory-*.json (if configured)

Approval Gates

  1. Tool call confirmation — every shell command or file write displayed with [Y/n/edit] prompt in TUI
  2. Login confirmation — OAuth flow or API key entry
  3. Skill invocation — user explicitly names skill; no implicit auto-invocation

Sandbox Lifecycle

  1. Agent constructs tool call (shell command, file edit, patch)
  2. codex-core routes to sandbox runner (Seatbelt on macOS, bwrap on Linux)
  3. Command executes with optional network isolation (CODEX_SANDBOX_NETWORK_DISABLED=1)
  4. Output returned to agent for next reasoning step

Skill Orchestration Flow (example: code-review)

  1. User invokes code-review skill
  2. Orchestrator skill spawns N parallel subagents (one per code-review-* skill)
  3. Each subagent returns findings
  4. Orchestrator collates all findings, returns numbered list with file paths + line numbers
  5. Optionally adds code-reviewed GitHub label
06

Memory Context

Codex CLI — Memory & Context

Session State

  • Conversation history: held in-process in codex-core; linear message list passed to model on each step
  • Trajectory files: serialized JSON trajectory saved to disk when output_path is configured; captures all messages, tool calls, and results
  • Config file: ~/.codex/config.toml (or project-level .codex/config.toml) stores model selection, sandbox settings, MCP server definitions

Context Files

  • AGENTS.md: project-level instructions read automatically by the agent at session start (similar to CLAUDE.md); provides coding conventions, crate rules, testing procedures
  • Skills: loaded on invocation; injected into the system prompt or handed as a user message when the skill is called

MCP-Bridged Memory

  • Codex CLI is an MCP client — it can connect to any MCP server including memory/knowledge-graph servers
  • The codex mcp subcommand manages configured server connections
  • codex-mcp crate handles MCP connection manager and tool mutation

Cross-Session Handoff

  • ~/.codex/ directory persists auth credentials and config
  • No built-in conversation replay/resume by default; trajectory files provide audit capability
  • crash_recovery: not documented as automatic; trajectory saves on each step provide partial recovery

Context Compaction

  • Not explicitly documented in the repo; the agent loop in codex-core handles context window limits by truncation or summary depending on model config
  • CODEX_SANDBOX_NETWORK_DISABLED env var signals to the agent that network tests should be skipped (not a compaction mechanism, but a form of environment-aware context trimming)
07

Orchestration

Codex CLI — Orchestration

Multi-Agent Support

Yes — explicitly supported via the code-review skill pattern: the orchestrator skill spawns one subagent per code-review-* skill, running them in parallel. The SKILL.md instructs: "Use subagents to review code … One subagent per skill."

Orchestration Pattern

Parallel fan-out (for skill orchestration) + sequential interactive loop (for normal agentic use).

The code-review orchestrator is the clearest example: parallel subagent spawns, then result aggregation. Normal task execution is sequential (step → approve → execute → next step).

Isolation Mechanism

Platform Mechanism
macOS Apple Seatbelt (/usr/bin/sandbox-exec); every shell command runs in a sandboxed subprocess
Linux bubblewrap (bwrap); filesystem namespacing + optional network disable
All CODEX_SANDBOX_NETWORK_DISABLED=1 disables network in sandboxed processes

This is the most sophisticated isolation mechanism in the batch — process-level sandboxing baked into the binary, not a wrapper.

Execution Mode

Interactive loop (primary) + one-shot (via --print) + background daemon (app-server for desktop integration)

Multi-Model

  • Single-model by default (OpenAI models only in the primary implementation)
  • BYOK supports any OpenAI-compatible endpoint
  • No documented per-role model routing (planning vs. execution)
  • Reasoning intensity can be set per-skill (xhigh reasoning in code-review)

Consensus Mechanism

None. Subagent results are aggregated by the orchestrator, not voted on.

Prompt Chaining

Yes — the code-review orchestrator passes each sub-skill's findings as inputs to its final aggregation step. The orchestrator prompt's output IS the product of chaining subagent outputs.

Max Concurrent Agents

Unknown (not documented); determined by the agent's parallel Task spawn count in the skill.

08

Ui Cli Surface

Codex CLI — UI & CLI Surface

Dedicated CLI Binary

  • Binary name: codex
  • Not a thin wrapper: self-contained Rust binary with full agent runtime, sandboxing, and TUI
  • Subcommands: codex (interactive), codex app, codex doctor, codex login, codex mcp, codex plugin, codex remote-control
  • Install: curl -fsSL https://chatgpt.com/codex/install.sh | sh or npm install -g @openai/codex or Homebrew

Terminal UI (TUI)

  • Type: terminal-tui
  • Stack: Ratatui (Rust TUI library) in codex-rs/tui/
  • Features:
    • Interactive conversation pane with diff previews
    • Tool call approval flow ([Y/n/edit] per action)
    • Chat composer with multi-line input
    • Status footer showing model, cost, sandbox state
    • Syntax-highlighted code and patch display

Desktop App

  • codex app connects to the local app-server HTTP daemon
  • The daemon (app-server-daemon) runs as a background process
  • The desktop experience (ChatGPT app or standalone) communicates with this daemon
  • Port: unknown (configured in app-server)

IDE Integration

  • VS Code, Cursor, and Windsurf via a separate Codex IDE extension (not in this repo)
  • Referenced from README: "install in your IDE" at developers.openai.com/codex/ide

Observability

  • AGENTS.md documents debug procedures and test commands
  • codex doctor subcommand for diagnosing configuration
  • Trajectory JSON files capture full session history when configured
  • analytics crate provides optional usage analytics (opt-in)

Remote Control

  • codex remote-control exposes an RPC interface
  • app-server-transport crate handles HTTP + SSE communication with the daemon

Related frameworks

same archetype · same primary tool · same memory type

Spec Kit ★ 106k

Turns a natural-language feature description into a complete, versioned, AI-executable specification pipeline installable for 30+…

OpenSpec ★ 51k

Adds a lightweight spec layer so AI coding assistants and humans agree on what to build before any code is written.

ECC (Everything Claude Code) ★ 193k

Comprehensive harness-native operator system: 246 skills + 61 agents + continuous learning hooks + multi-model routing across 8…

Gemini CLI (Google) ★ 105k

Bring the full power of Gemini into the terminal with a free tier, Google Search grounding, and extensible MCP support.

cursorrules v5 (kinopeee) ★ 1.1k

Bilingual (ja/en) Cursor rule set with tricolor task classification, security-first prompt injection defense, and structured git…

windsurfrules v5 (kinopeee) ★ 364

Windsurf/Antigravity port of cursorrules v5 — same tricolor task classification and injection defense, translated to Windsurf's…