Skip to content
/

AgentOps (boshu2)

agentops-boshu · boshu2/agentops · ★ 369 · last commit 2026-05-26

Primitive shape 84 total
Skills 77 Subagents 6 Hooks 1
00

Summary

AgentOps (boshu2) — Summary

AgentOps is a "Context Development Life Cycle" (CDLC) control plane that sits on top of existing coding agent runtimes (Claude Code, Codex, Cursor, OpenCode) and adds persistent .agents/ state: decisions, learnings, planning rules, and citations that compound across sessions. The project ships 77 skills, an ao CLI binary (Go), optional hooks, multi-model council consensus (/council --mixed), and the "rpi" (research-plan-implement) loop as its inner execution unit. Context is the engineering artifact: the .agents/ corpus is the "moat" — a compounding knowledge base that makes each subsequent session smarter by injecting prior decisions and learnings before a single line is written. AgentOps 3.0 is explicitly "hookless-first" — CI is the authoritative gate, not runtime hooks — and the workflow is driven by skills and the ao CLI. Multi-model councils run 6 parallel judges across Claude Code and Codex CLI and produce a consensus verdict with a PASS/WARN/FAIL rollup recorded in .agents/council/.

Differs from seeds: closest to taskmaster-ai in providing a multi-phase SDLC structure with the rpi/evolve loop, and to ccmemory in persisting cross-session state in .agents/. The architectural delta: AgentOps treats context as a compiled artifact (akin to source code → binary) with promotion ratchet rules (no self-grade, fresh agent on failure, knowledge compiles into constraints) that prevent corpus rot — a concern absent from all 11 seeds.

01

Overview

AgentOps (boshu2) — Overview

Origin

AgentOps (boshu2/agentops) is developed by the AgentOps community under a non-standard license, 369 stars, 37 forks, 8 contributors as of 2026-05-26 — actively maintained. Version tracked via commits; the repo was itself built with AgentOps, demonstrating dogfooding. As of 2026-05-04, its .agents/ corpus held ~1,842 learnings, ~186 patterns, ~80 planning rules, and ~3,867 cited decisions.

Philosophy

AgentOps introduces the CDLC (Context Development Life Cycle) — a parallel to the SDLC where "source code" is replaced by "context" as the primary engineering artifact. The fundamental analogy:

Software Engineering Coding-Agent World
Source code Context (corpus, planning rules, learnings)
SDLC CDLC
Libraries (Maven, npm) Context libraries (.agents/ corpus)
Compilers Context compilers (ao compile → wiki)
Code review Multi-model councils
CI/CD Validation gates
Postmortems Automated postmortems (/post-mortem → learnings)

AgentOps 3.0 Thesis

"AgentOps 3.0 is the in-session agent operating loop and the context compiler that feeds it. It compiles the best of software engineering into compact, verifiable, reusable agent context, then compounds that context across every session, model, and harness. The product is the in-session loop plus the corpus it compounds."

Three Ratchet Rules (Anti-Decay)

  1. No self-grade: The agent that did the work never validates its own work
  2. Fresh agent on failure: On failure, spawn a fresh-context agent — a failed attempt's context is exhaust, not seed
  3. Knowledge becomes constraints: A learning is not durable until it compiles into a gate, a test, or a rule

Manifesto-Style Quotes

"A loop that runs without rules repeats flat. Three invariants turn repetition into compounding."

"The .agents/ corpus is the moat. Running that loop out of session is a separate concern that AgentOps delegates to an orchestration substrate."

"AgentOps 3.0 introduces no new methodology. It composes four proven practices into a small, executable waist: BDD/Gherkin, DDD, Hexagonal (ports & adapters), TDD."

02

Architecture

AgentOps (boshu2) — Architecture

Distribution

AgentOps is a multi-platform plugin + CLI system:

Claude Code:

claude plugin marketplace add boshu2/agentops
claude plugin install agentops@agentops-marketplace

Codex CLI (macOS/Linux/WSL):

curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-codex.sh | bash

OpenCode:

curl -fsSL https://raw.githubusercontent.com/boshu2/agentops/main/scripts/install-opencode.sh | bash

Any skills-compatible agent:

npx skills@latest add boshu2/agentops --cursor -g

ao CLI (macOS):

brew tap boshu2/agentops && brew install agentops && ao version

Directory Tree

skills/                     # 77 skills (see components)
  research/SKILL.md
  implement/SKILL.md
  council/SKILL.md
  rpi/SKILL.md
  ...
.agents/                    # Compounding corpus (state, not code)
  decisions/                # Cited decisions
  learnings/                # Learnings from sessions
  runs/                     # Session run artifacts
  council/                  # Council verdicts
cli/
  cmd/ao/                   # ao CLI source (Go 1.26)
  internal/                 # CLI internals
  docs/COMMANDS.md          # CLI reference
.claude/
  CLAUDE-base.md            # Vibe-coding methodology
  CLAUDE-extension.md       # System instructions
  hooks/                    # prepare-commit-msg git hook
.claude-plugin/             # Claude Code plugin config
.codex-plugin/              # Codex CLI plugin config
.opencode/                  # OpenCode plugin config
AGENTS.md, AGENTS-CI.md, AGENTS-WORKFLOW.md, AGENTS-CODEX.md, AGENTS-RUNTIME.md

Required Runtime

  • Required: one of claude / codex / opencode + git
  • Recommended: ao (Go CLI binary)
  • Optional: bd (beads issue tracking), gc (Gas City out-of-session substrate), go (build from source), jq, rg, curl, tmux

Target AI Tools

Claude Code, Codex CLI, Cursor, OpenCode — explicitly cross-tool by design.

03

Components

AgentOps (boshu2) — Components

Skills (77 total)

Key skills (from skills/ directory listing):

Skill Purpose
research Explore codebase and write findings
implement Execute single issue end-to-end (TDD-first)
council Multi-model consensus (brainstorm/debate/verdict modes)
rpi Research-plan-implement loop (inner loop)
evolve N rpi ticks toward goal (outer loop)
crank In-session fan-out across worktrees
swarm Parallel in-session agent team
vibe Post-implementation validation gate
validate Pre-flight validation
pre-mortem Entry-side behavior validation
post-mortem Exit-side feedback → learnings
review Code review
release Release workflow
bootstrap Bootstrap new repo/context
brainstorm Divergent ideation
domain DDD domain vocabulary
standards Project standards
handoff Cross-session context handoff
harvest Extract learnings from session
compile Compile corpus to wiki
forge AI-assisted scaffolding
status Show ratchet and corpus state
trace Trace decision/learning lineage
scenario BDD scenario writing
security / security-suite Security analysis
flywheel / ship-loop CI/CD automation
using-agentops Meta-skill for skill invocation
using-gc Gas City orchestration
hooks-authoring Author Claude Code lifecycle hooks
bug-hunt Systematic bug investigation
...and 47 more

ao CLI Binary

Installed via brew/binary/go-build. Key commands (from cli/README.md):

Command Purpose
ao quick-start Initialize .agents/, starter surfaces, hooks
ao factory start --goal "..." Compile briefing-first startup context + Codex start
ao rpi phased "..." CLI-first Discovery → Implementation → Validation
ao rpi status Monitor long-running phased work
ao overnight start --goal "..." Local overnight compounding
ao overnight report Render Dream summary
ao codex stop Close bookkeeping loop at session end
ao knowledge brief --goal "..." Build task briefing
ao context assemble 5-section task briefing
ao lookup / ao search Corpus retrieval
ao compile Compile corpus to wiki
ao session bootstrap Universal init prompt for agents
ao forge transcript Manual lifecycle work
ao flywheel close-loop Close feedback loop

Multi-Agent: council

/council spawns N parallel judges (3-6 by default) across available runtimes (Claude/Codex). Three modes: brainstorm, debate, verdict. Roster presets: security-audit, leadership-quartet, etc.

Hooks

.claude/hooks/prepare-commit-msg — git hook for commit message standards.

AgentOps 3.0 is hookless-first by design: CI is the authoritative gate, not Claude Code lifecycle hooks. The hooks-authoring skill teaches users to write their own hooks if needed.

.agents/ Corpus

State files written/read by AgentOps:

  • .agents/decisions/ — cited architectural decisions
  • .agents/learnings/ — accumulated learnings
  • .agents/runs/ — session run artifacts
  • .agents/council/ — council verdicts with PASS/WARN/FAIL
  • .agents/research/ — research outputs
05

Prompts

AgentOps (boshu2) — Prompts

Prompt 1: Research Skill (research/SKILL.md)

Technique: Metadata-annotated skill contract with hexagonal architecture tagging and schema-validated output

Verbatim frontmatter:

---
name: research
description: Explore and write findings.
practices:
- wiki-knowledge-surface
- pragmatic-programmer
- ddd-bounded-context
hexagonal_role: driving-adapter
consumes:
- inject
- repo-context
produces:
- .agents/research/*.md
- result.json
context_rel: []
skill_api_version: 1
allowed-tools: Read, Grep, Glob, Bash, Write
metadata:
  tier: execution
  dependencies:
  - inject
intel_scope: topic
output_contract: skills/research/schemas/findings.json
---

Analysis: The hexagonal_role, consumes, produces, and output_contract fields define a formal hexagonal architecture contract for each skill. Skills are first-class software components with defined inputs/outputs, not just prose instructions. The output_contract points to a JSON schema that validates skill output.

Prompt 2: council Skill (council/SKILL.md)

Technique: Mode taxonomy with formal deliberation lifecycle + backend-agnostic spawn contract

Verbatim excerpt:

## Modes — the deliberation taxonomy
`--mode` selects one of exactly three deliberation patterns; `verdict` is the default.

| `--mode` | Pattern | Synthesis |
|----------|---------|-----------|
| `brainstorm` | **diverge** — agents generate options independently before any cross-talk | ranked set of ideas |
| `debate` | **contend** — independent positions → adversarial 0–1000 cross-scoring → reveal round | ranked decision with recorded dissent |
| `verdict` | **converge** — agents judge the artifact against the bar independently | PASS / WARN / FAIL with consolidated findings |

### Spawn backend (MANDATORY)
Council requires a runtime that can spawn parallel subagents.
If no multi-agent capability is detected, fall back to --depth=quick (inline single-agent).

Analysis: The mode taxonomy is "frozen as an executable spec" (a .feature file). The adversarial 0-1000 cross-scoring in debate mode is a formal evaluation protocol. Backend-agnostic spawn contract means council works on Claude Native Teams, Codex Sub-Agents, Background Tasks, or Inline.

Prompt 3: Vibe-Coding Methodology (CLAUDE-base.md)

Technique: Applied rationality protocol with explicit before/after reasoning templates

Verbatim excerpt:

## Explicit Reasoning Protocol
BEFORE actions that could fail:

DOING: [action]
EXPECT: [specific predicted outcome]
IF WRONG: [what I'll conclude, what I'll do next]

AFTER the action:
RESULT: [what actually happened]
MATCHES: [yes/no]
THEREFORE: [conclusion and next action, or STOP if unexpected]

Analysis: Borrows from "making beliefs pay rent in anticipated experiences" (Yudkowsky/LessWrong rationality). Forces the agent to state predictions before acting and update beliefs after — a structured anti-hallucination protocol.

09

Uniqueness

AgentOps (boshu2) — Uniqueness & Positioning

Differs From Seeds

Closest to taskmaster-ai in providing multi-phase SDLC structure (rpi/evolve loops mirror Taskmaster's task decomposition), and to ccmemory in persisting cross-session state. The architectural delta: AgentOps treats context as a compiled artifact subject to promotion ratchet rules (no self-grade, fresh agent on failure, knowledge compiles into constraints) that prevent corpus rot — a concern absent from all 11 seeds. The council skill's multi-model adversarial consensus (PASS/WARN/FAIL verdicts across Claude+Codex judges) is unique in the surveyed space. No seed ships a hexagonal architecture skill taxonomy with formal consumes/produces/output_contract metadata per skill.

Distinctive Opinion

"The .agents/ corpus is the moat. A loop that runs without rules repeats flat. Three invariants turn repetition into compounding."

The three ratchet rules (no self-grade, fresh agent on failure, knowledge becomes constraints) are the core engineering insight. Most tools accumulate context naively; AgentOps has explicit rules for when knowledge "promotes" to a durable constraint.

The "Vibe-coding methodology" (CLAUDE-base.md) imports rationalist epistemics ("making beliefs pay rent") into agent behavior — a rare philosophical stance in a tooling repo.

Positioning

AgentOps targets teams doing sustained multi-session development who want a compounding knowledge base, not just a better single-session workflow. The "dogfooding" stat (~3,867 cited decisions from building AgentOps itself) demonstrates the compound effect claim.

Observable Failure Modes

  1. Corpus bloat: Without active curation, .agents/ accumulates stale learnings. The ratchet rules are process constraints, not technical enforcements — teams can ignore them.
  2. ao CLI dependency: Core corpus operations (ao knowledge brief, ao compile, ao lookup) require the ao binary. Sessions without ao lose the retrieval capabilities that make the corpus valuable.
  3. Multi-runtime complexity: Supporting Claude Code, Codex CLI, Cursor, and OpenCode simultaneously creates maintenance overhead. Skill parity across runtimes is a stated concern (AGENTS-CODEX.md).
  4. Council latency: Spawning 6 parallel judges adds wall-clock latency to every consensus gate. The --depth=quick fallback exists but reduces quality.
  5. Hookless-first gap: Without CI hooks in place, the "CI is the authoritative gate" principle is only as strong as team discipline.

Cross-References

  • Uses bd (beads) for issue tracking — another project in the ecosystem
  • Gas City (gc) for out-of-session orchestration — a separate substrate
  • skills.sh (skills.sh npm package) as the cross-runtime install mechanism
04

Workflow

AgentOps (boshu2) — Workflow

The rpi Loop (Inner Loop)

Research → Plan → Implement → Validate. One "tick" operates on one vertical slice (one bead/issue).

  1. Research: /research <topic> — loads prior art from .agents/, explores codebase, writes .agents/research/*.md
  2. Plan: Plan phase defines behavior + acceptance criteria (BDD/Gherkin style)
  3. Implement: /implement <id> — TDD-first, commit per slice
  4. Validate: /vibe or /validate — validates behavior and contract

The evolve Loop (Outer Loop)

N rpi ticks toward a goal with post-mortem and knowledge extraction between ticks:

evolve start → select next bead → rpi tick → post-mortem → harvest learnings → loop

council Gate

At any loop boundary where quality judgment is required:

/council validate this PR
/council --mode=brainstorm caching approaches
/council --mode=debate should we adopt event sourcing?
/council --depth=deep --runtime=mixed validate auth system

Spawns N parallel agents; returns PASS/WARN/FAIL with consolidated findings recorded in .agents/council/<run-id>/verdict.md.

Phase-to-Artifact Map

Phase Artifact
Research .agents/research/<topic>.md
Planning Slice validation plan + BDD behaviors
Implement Git commit per slice + closed bead
Validate Test results + gate verdict
Post-mortem .agents/learnings/<date>-<id>.md
Council .agents/council/<run-id>/verdict.md
Compile .agents/wiki/*.md

Approval Gates

AgentOps is human-in-the-loop for high-rigor work, on-the-loop for scheduled compounding:

  • Research phase: optional --auto flag skips human approval
  • Post-rpi: human reviews verdict before continuing evolve
  • Council: WARN results require human decision before proceeding
  • CI gates: canonical authority for go/no-go (hookless-first)

From README (verbatim trace of /research):

> /research add rate limiting to /login

[corpus]   3 prior auth decisions cited
[corpus]   2 planning rules apply: rate-limit-jitter, redis-fallback-paths
[corpus]   1 learning: 2026-03-08 — token bucket without jitter caused thundering-herd at 5/min
[findings] middleware/auth.go owns /login; no rate limiting present
[plan]     token bucket, 5/min per IP, Redis-backed, jittered per 2026-03-08 learning
[recorded] .agents/runs/2026-05-08-rate-limit/research.md
06

Memory Context

AgentOps (boshu2) — Memory & Context

.agents/ Corpus (The Moat)

AgentOps' primary memory is the .agents/ corpus — a Git-tracked, human-readable set of structured markdown files that compound across sessions.

Path Content
.agents/decisions/ Cited architectural decisions (with date, session, evidence)
.agents/learnings/ Post-mortem learnings (with reference implementations)
.agents/runs/ Session run artifacts per feature
.agents/council/<run-id>/ Multi-model council verdicts
.agents/research/ Research outputs per topic
.agents/rpi/ rpi loop state
.agents/evolve/ evolve loop state
.agents/nightly/ Overnight Dream run artifacts
.agents/reconcile/ Reconciliation artifacts

Corpus stats as of 2026-05-04: ~1,842 learnings, ~186 patterns, ~80 planning rules, ~3,867 cited decisions.

Memory Type: File-based + Context Compilation

  1. Raw storage: Markdown files in .agents/
  2. Compiled retrieval: ao compile converts corpus to a wiki (ao knowledge brief summarizes)
  3. Search: ao lookup and ao search for direct retrieval
  4. Injection: Prior art loaded before each session via ao knowledge brief --goal

Context Window Management

  • context: window: fork or isolated per skill — each skill specifies its context window strategy
  • context: sections: exclude: [HISTORY, TASK] — skills can exclude certain context sections
  • ao session bootstrap — universal init prompt that orients every agent identically regardless of model

Cross-Session Handoff

The handoff skill packages current context for the next agent/session. Explicit cross-session handoff is a first-class primitive, not an afterthought.

Compaction

context: intent: mode: questions — skills specify their context mode. "Questions" mode is intent-extraction (most compact). The ao compile command distills the corpus to reduce context overhead.

.agents/ is Canonical, Not Symlinked

From AGENTS-RUNTIME.md: "no-tracked-.agents, no-symlinks, embedded-sync" — .agents/ must be a real directory, not a symlink.

07

Orchestration

AgentOps (boshu2) — Orchestration

Multi-Agent Pattern

Hierarchical + Parallel fan-out

  • rpi/evolve: Sequential human-guided loop
  • crank/swarm: In-session parallel fan-out across worktrees
  • council: Parallel N-judge consensus (3-6 agents)

Council (Multi-Model Consensus)

The /council skill is the most sophisticated orchestration primitive. It:

  1. Seals an evidence packet
  2. Spawns N parallel judges (configurable roster: claude/codex mix)
  3. Runs three deliberation modes: brainstorm (diverge), debate (adversarial), verdict (converge)
  4. Synthesizes to one verdict: PASS / WARN / FAIL
  5. Records to .agents/council/<run-id>/verdict.md

Example: --mode=debate --adversarial runs 2 rounds of adversarial cross-scoring before synthesis.

crank and swarm

  • crank: In-session team fan-out — multiple worktrees worked in parallel within one session
  • swarm: Peer-to-peer agent communication for distributed work

Execution Mode

Interactive-loop (primary) — human-in-the-loop within sessions.
Background-daemonao overnight start runs the Dream flow headlessly out-of-session (optional Gas City substrate).
Scheduledao overnight setup + cron for periodic compounding.

Isolation Mechanism

worktree_per_feature — crank/swarm use git worktrees for parallel isolation. Individual rpi runs operate on the working tree directly.

Multi-Model Routing

Yes — council explicitly routes judges across models:

/council --runtime=mixed validate this PR
# Spawns: claude/judge-1, claude/judge-2, codex/judge-1, codex/judge-2

The --roster flag selects preset judge panels (e.g., security-audit, leadership-quartet).

Orchestration Flow

ao factory start --goal "fix auth"
  ↓ Corpus loaded + briefing compiled
/rpi phased "fix auth"
  ↓ Research → Plan → Implement → Validate
/council validate PR (if needed)
  ↓ N judges → verdict
ao codex stop (session end)
  ↓ Learnings extracted → .agents/ corpus updated

Gas City (Out-of-Session)

Optional gc substrate runs whole ao rpi/ao evolve loops unattended. AgentOps does NOT wrap gc — it is a guided dependency. The using-gc skill documents the workflow.

08

Ui Cli Surface

AgentOps (boshu2) — UI / CLI Surface

CLI Binary: ao

  • Name: ao
  • Is thin wrapper: No — standalone Go CLI with its own runtime
  • Install: brew install agentops / release binaries / go install
  • Subcommands (from cli/README.md):
Subcommand Purpose
ao quick-start Initialize repo with .agents/, starter knowledge, hooks
ao factory start --goal "..." Compile briefing-first runtime context
ao rpi phased "..." Discovery → Implementation → Validation lane
ao rpi status Monitor long-running phased work
ao overnight setup Detect host constraints, persist Dream config
ao overnight start --goal "..." Local overnight compounding
ao overnight report Render Dream summary + council state
ao codex stop Close loop at session end
ao knowledge brief --goal "..." Build task briefing
ao context assemble 5-section task briefing
ao lookup Direct corpus retrieval
ao search Search corpus metadata
ao compile Compile corpus → wiki
ao forge transcript Manual lifecycle
ao flywheel close-loop Close feedback loop
ao session bootstrap Universal agent init
ao version Version check

Local UI

None. AgentOps is entirely CLI + chat interface.

IDE Integration

Works with all 4 major AI coding tools: Claude Code, Codex CLI, Cursor, OpenCode.

Plugin files per tool:

  • .claude-plugin/ (Claude Code)
  • .codex-plugin/ (Codex CLI)
  • .opencode/ (OpenCode)

Observability (Agent)

.agents/ corpus acts as an audit log:

  • Decision provenance with session timestamps
  • Learning citations with reference implementations
  • Council verdicts with model breakdowns (per-judge votes)
  • Run artifacts with full research/plan/implement trace

CLAUDE.md

Agents entering the repo find AGENTS.md, AGENTS-CI.md, AGENTS-WORKFLOW.md, AGENTS-CODEX.md, AGENTS-RUNTIME.md — split by domain so agents can load only relevant sections.

Gas City Integration

Optional: if gc is installed, ao rpi/ao evolve can run out-of-session as a scheduled daemon.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.