Batch 21 Notes — Minimal & Educational Coding Agents (Codex/Gemini CLI + tiny SWE-agent family)
Roster
| # | Slug | Display Name | Stars | Language | Tier | Files |
|---|---|---|---|---|---|---|
| 1 | codex-cli |
OpenAI Codex CLI | 85,783 | Rust | A | 11 |
| 2 | gemini-cli |
Gemini CLI | 104,604 | TypeScript | A | 11 |
| 3 | ra-aid |
RA.Aid | 2,224 | Python | A | 11 |
| 4 | ra-aid-che-incubator |
che-incubator demo | 2 | YAML/Markdown | A | 11 |
| 5 | nanocoder |
Nanocoder | N/A | N/A | C | 2 |
| 6 | mini-swe-agent |
mini-swe-agent | 4,526 | Python | A | 11 |
| 7 | micro-agent |
Micro Agent (Builder.io) | 4,307 | TypeScript | A | 11 |
| 8 | pi-coding-agent |
pi (badlogic/pi-mono) | 55,166* | TypeScript | A | 11 |
| 9 | tmuxai |
TmuxAI | 1,833 | Go | A | 11 |
| 10 | hermes-ide |
Hermes IDE | 257 | TypeScript/Rust | A | 11 |
*pi-mono is a monorepo; the coding-agent package stars are part of the larger monorepo count.
Total: 9 full 11-file reports, 1 Tier C stub (nanocoder).
Tier C Items
- nanocoder (
Nanocoder-ai/nanocoder): HTTP 404 from GitHub API. No public repository found. Written as Tier C stub:00-summary.md+METRICS.yamlonly, withstatus: insufficient-public-material.
Intra-Batch Patterns
1. The Massive Baseline Pair
codex-cli (85k stars, Rust) and gemini-cli (104k stars, TypeScript) dominate by star count. Both are full-featured terminal binaries with skill/command systems, but their design philosophies diverge sharply:
- codex-cli: Rust binary, sandboxed execution (Apple Seatbelt + Linux bwrap), parallel fan-out orchestration, 12 skills
- gemini-cli: TypeScript, Google Search grounding as built-in tool, A2A (Agent-to-Agent) server, TOML command files with shell interpolation
!{cmd}
2. Intentional Minimalism as Pedagogy
mini-swe-agent (~130-line DefaultAgent class) and micro-agent (~8-line system prompt, ~200 LOC) are deliberate reference implementations. Both achieve competitive benchmark results with almost no code:
- mini-swe-agent: 74% SWE-bench Verified with bash-only tool and stateless
subprocess.run - micro-agent: TDD loop (generate test → iterate until pass) in TypeScript with explicit non-goals (no multi-file, no installs) These are the clearest demonstrations in the entire catalog that agent capability is mostly model capability — the harness is nearly irrelevant.
3. Three-Stage Pipeline as Common Pattern
RA.Aid and (less explicitly) codex-cli both use Research → Planning → Implementation pipelines. RA.Aid formalizes this as three LangGraph stages with separate prompt modules. This pattern recurs across the catalog (BMAD-METHOD, Kiro) but RA.Aid is the most explicit Python implementation.
4. GUI vs CLI Split
This batch spans both ends of the interface spectrum: codex-cli and gemini-cli are pure terminal binaries; hermes-ide is a pure desktop GUI; pi and tmuxai sit in between (terminal binary with optional rich output modes). The GUI approach (hermes-ide) is unique in the batch — it treats Claude's stream-json events as UI data, not as terminal text.
5. Provider Lock-in Patterns
- Claude-only: hermes-ide (Agent mode), pi (any provider via
packages/aiabstraction — but pi's identity is Claude-native) - Gemini-only: gemini-cli
- OpenAI-only: codex-cli, micro-agent (though micro-agent supports any OpenAI-compatible endpoint)
- Model-agnostic: ra-aid, mini-swe-agent, tmuxai
6. Memory Architecture Diversity
- SQLite-backed typed repositories: ra-aid
- Linear message history = trajectory = training data: mini-swe-agent (stateless, no persistence)
- Knowledge Base + context pins: tmuxai, hermes-ide
- No memory: micro-agent (single TDD session per invocation)
- Context files (AGENTS.md, GEMINI.md, CLAUDE.md): codex-cli, gemini-cli, pi
Most Interesting Find
RA.Aid's embedded self-critique in the planning prompt: The planning system prompt contains the line "You have often been criticized for creating overly complex solutions when simpler, more targeted fixes would suffice." This is a direct injection of known model failure modes into the system prompt as a corrective. It is a rare example of a framework explicitly encoding model weaknesses as first-class prompt content, rather than relying on instruction-following to avoid them.
Runner-up: mini-swe-agent treating the linear message history as both trajectory and training data. The framework generates SWE-bench trajectories as a side effect of normal operation, making every production run a potential training sample without any extra tooling.
Cross-References
- codex-cli is the upstream reference for the
.codex/skills/skill format — several other batch agents (codex-native bridges, batch 22) derive their skill system from this. - gemini-cli introduces the TOML command file format (
.gemini/commands/) and!{cmd}shell interpolation — compare with.claude/commands/in Anthropic's Claude CLI. - ra-aid is the most complete Python/LangGraph coding agent in the catalog. Cross-reference with batch 28 (LangChain/Pydantic harnesses).
- mini-swe-agent is a reference implementation for the SWE-agent family. Cross-reference with the original SWE-agent (batch 5) and SWE-bench evaluation papers.
- hermes-ide is the only desktop GUI in the batch. Cross-reference with Kiro (VS Code extension, batch 11) and pi (Tauri desktop, also batch 21 — but pi ships a CLI binary, hermes does not).
- tmuxai's tmux screen context pattern is unique in the catalog. No other framework reads raw terminal screen content as context.
Duplicates Encountered
None. All 10 assigned slugs were distinct frameworks. nanocoder was unavailable (404) but is not a duplicate of any other framework.
Architectural Observations for Master Catalog
- The "passthrough renderer" archetype (hermes-ide) is underrepresented in seed frameworks — none of the 11 seeds are GUI desktop apps. Hermes is the primary example.
- The "TDD loop" pattern (micro-agent) is the cleanest single-purpose agent pattern in the catalog: one input (failing test or task), one output (code that passes the test), one loop.
- The "bash-only tool" pattern (mini-swe-agent) is worth tracking — it achieves near-SOTA on SWE-bench without file read/write/edit primitives, relying entirely on shell commands.
- The A2A (Agent-to-Agent) protocol in gemini-cli is the only inter-agent communication protocol in this batch. It is experimental but signals a direction for multi-agent coordination in the Gemini ecosystem.