serpro69/claude-toolbox

serpro69-claude-toolbox · serpro69/claude-toolbox · ★ 142 · last commit 2026-05-26

Structured 11-skill development pipeline (design→review→implement→review→test→doc) with persistent Capy BM25 knowledge base, language-specific profiles for 8 languages, and Go-based cross-provider codegen targeting both Claude Code and Codex from a single canonical source.

Best whenAI skills should be minimal, curated, and battle-tested — not AI-generated slop. Cross-provider support requires code generation, not copy-paste. Isolated su…

Skip ifAI-generated skill content without human review, Spawning exploration agents before exploring yourself (doubles token consumption)

vs seeds

superpowers' none…

Primitive shape 27 total

Commands 5 Skills 11 Subagents 5 Hooks 1 MCP tools 5

Summary

serpro69/claude-toolbox — Summary

One-Line Description

ELv2-licensed, battle-tested development workflow toolkit for Claude Code and OpenAI Codex: 11 pipeline skills, Capy MCP context router, multi-model review via sub-agents, and language-specific profiles — distributed as an installable plugin with Go-based cross-provider code generation.

What This Is

claude-toolbox is a structured development workflow framework built around an 11-skill pipeline covering the full cycle from idea to documentation. The canonical content lives in klaude-plugin/ (for Claude Code) and is auto-generated into kodex-plugin/ (for Codex) by a Go tool (cmd/generate-kodex/). It is NOT a template to fork — it distributes via Claude Code's plugin marketplace system, pointing to the author's own GitHub repository as a custom marketplace.

The core loop is: /kk:design → /kk:review-design → /kk:implement → /kk:review-code → /kk:test → /kk:document

Scale

Stars: 142 (2026-05-26)
License: ELv2 (Elastic License v2)
Skills: 11 (design, review-design, implement, review-code, review-spec, test, document, diff-skill, merge-docs, chain-of-verification, dependency-handling)
Agents: 5 (code-reviewer, design-reviewer, eval-grader, profile-resolver, spec-reviewer)
Commands: 5 directories (chain-of-verification, migrate-from-taskmaster, review-code, review-spec, template)
Hooks: 1 PreToolUse Bash validator
MCP Servers: 1 (Capy — context-window routing with BM25 + SQLite knowledge.db)
Language profiles: 8 (go, java, js_ts, kotlin, k8s, k8s-operator, python, skill-md)

Target Audience

Experienced developers who want structured, repeatable agentic development workflows across multiple languages and multiple projects — with persistent context via a knowledge base and multi-model code review. Not a beginner-friendly quickstart; more like professional tooling for power users.

Installation

Add to an existing project (adopting) or use as a repository template
Enable via settings.json: "enabledPlugins": {"kk@claude-toolbox": true}
Point at custom marketplace: "extraKnownMarketplaces": {"claude-toolbox": {"source": {"source": "github", "repo": "serpro69/claude-toolbox"}}}

Skills are then available as /kk:<name>. Three setup paths: template, adopt-existing, plugin-only.

Key Differentiators

Cross-provider code generation: klaude-plugin/ is the canonical source; make generate-kodex produces Codex-compatible output (resolves ${CLAUDE_PLUGIN_ROOT}, converts agents to TOML). CI checks freshness.
Capy MCP router: Local SQLite knowledge base (knowledge.db) with BM25 search for context-window-aware retrieval. Blocks curl/wget to prevent exfiltration. Sandboxed subprocess.
Language-specific profiles: 8 distinct profile directories providing per-language implementation, testing, and review guidance — invoked dynamically via profile-resolver agent.
Multi-model code review: /kk:review-code:isolated spawns independent sub-agent reviewers with no authorship bias; supports routing to external Gemini models.
Non-permissive license: ELv2 restricts commercial SaaS use — unique among batch peers.
Opinionated behavioral CLAUDE.extra.md: Direct/challenging interaction style, explicit assumption-stating, document-deferred-work protocol — not soft guardrails but hard behavioral contracts.

Distribution

serpro69/claude-toolbox — Distribution & Installation

Distribution Type

Plugin-based (Claude Code plugin marketplace + GitHub repository template). Not a static fork-and-use scaffold; distributes as a live plugin that can be updated independently of the user's project.

Install Methods

Path 1: Template (new projects)

Use the GitHub template to create a new repository — gets the full configuration including .claude/, klaude-plugin/, template sync infrastructure, and Makefile targets.

Path 2: Adopt into existing project

Follow the adopting guide — copies .claude/ settings, hooks, and scripts; wires up the plugin; sets up template sync to pull future updates.

Path 3: Plugin-only

Add to settings.json:

{
  "enabledPlugins": {
    "kk@claude-toolbox": true
  },
  "extraKnownMarketplaces": {
    "claude-toolbox": {
      "source": {
        "source": "github",
        "repo": "serpro69/claude-toolbox"
      }
    }
  }
}

Config Files

File	Purpose
`.claude/settings.json`	Core settings: effortLevel, model, permissions, statusline, plugin enables, marketplace
`.claude/settings.local.json`	Machine-local overrides (not committed)
`.capy.toml`	Capy MCP config — `store.path = ".capy/knowledge.db"`
`.mcp.json`	MCP server wiring — points to `capy.sh serve`
`klaude-plugin/hooks/hooks.json`	Hook definitions shipped with the plugin

Runtime Requirements

Claude Code (primary target)
OpenAI Codex (secondary; auto-generated kodex-plugin/)
Go 1.x (for make generate-kodex — developer workflow only)
Bash (hook scripts, statusline)
Python (requirements.txt — capy dependencies)

Template Sync Infrastructure

The repo ships template-sync.sh — a mechanism to pull future toolbox updates into downstream projects. The sync script preserves project-specific customizations while applying upstream changes. Running make sync-template or equivalent keeps the .claude/ config fresh.

Makefile Targets

Target	Purpose
`make generate-kodex`	Rebuild `kodex-plugin/` and `.codex/agents/` from `klaude-plugin/` source
`make sync-template`	Pull toolbox updates into downstream project
`for test in test/test-*.sh; do $test; done`	Run test suites

License Implications

ELv2 (Elastic License v2): free to use for internal/personal projects, cannot be offered as a hosted SaaS service to others. This is the most restrictive license in the batch — all other frameworks use MIT/Apache/CC0/unlicensed.

Primitives

serpro69/claude-toolbox — Primitives

Claude Code Primitives Used

Skills (11 — via klaude-plugin/)

All prefixed /kk: — invoked as slash skills:

Skill	Purpose
`/kk:design`	Idea → refinement questions → design docs + task list in `docs/wip/`
`/kk:review-design`	Design gap analysis before implementation
`/kk:implement`	Execute task list with code review checkpoints between batches
`/kk:review-code`	SOLID violations, security, quality; `:isolated` variant uses sub-agent reviewers
`/kk:review-spec`	Spec quality review
`/kk:test`	Test coverage verification
`/kk:document`	Documentation generation
`/kk:diff-skill`	Diff-based skill comparison
`/kk:merge-docs`	Document merge/consolidation
`/kk:chain-of-verification`	Multi-step verification pipeline
`/kk:dependency-handling`	Dependency update and vetting workflow

Skills are organized in klaude-plugin/skills/<name>/SKILL.md with supporting files per skill.

Commands (5 directories in klaude-plugin/commands/)

Command	Variants
`chain-of-verification`	default + isolated
`migrate-from-taskmaster`	(migration utility)
`review-code`	default + isolated
`review-spec`	default + isolated
`template`	(skill authoring template)

Agents (5)

Agent	Role
`code-reviewer.md`	Independent code review sub-agent
`design-reviewer.md`	Design document review sub-agent
`eval-grader.md`	Evaluation scoring sub-agent
`profile-resolver.md`	Language/framework profile detection and injection
`spec-reviewer.md`	Specification review sub-agent

Hooks (1 event, 1 rule)

Event	Matcher	Type	Script
PreToolUse	Bash	command	`validate-bash.sh`

Pre-validates all Bash tool calls before execution. Analogous to zbruhnke-cc-starter's validate-bash.sh but shipped via plugin hooks.json rather than project-level hooks.

MCP Servers (1)

Capy — local context-window routing server:

Launched as bash .claude/scripts/capy.sh serve
SQLite knowledge store at .capy/knowledge.db
BM25 full-text search over indexed knowledge
Tools: capy_execute, capy_batch_execute, capy_fetch_and_index, capy_index, capy_search
Blocks outbound curl/wget (sandboxed subprocess)
Config: .capy.toml with store.path = ".capy/knowledge.db"

Language Profiles (8)

Stored in klaude-plugin/profiles/:

go/ — Go-specific implementation + test + review guidance
java/ — Java profile
js_ts/ — JavaScript/TypeScript profile
kotlin/ — Kotlin profile
k8s/ — Kubernetes profile
k8s-operator/ — Kubernetes operator pattern profile
python/ — Python profile
skill-md/ — Skill file authoring template

Profiles are selected dynamically via the profile-resolver agent which detects the project language and injects the appropriate profile.

Shared Instruction Pattern

klaude-plugin/skills/_shared/ contains shared instruction files referenced via per-skill symlinks (shared-<name>.md → ../_shared/<name>.md). This prevents path traversal in bundled skills while keeping instructions DRY.

Codex Primitives (Generated)

kodex-plugin/ — auto-generated from klaude-plugin/ via cmd/generate-kodex/:

Resolves ${CLAUDE_PLUGIN_ROOT} variable references
Injects required headers
Parallel structure: same skills, same agents (converted to TOML at .codex/agents/)
Hand-authored Codex files (not generated): .codex/config.toml, .codex/hooks.json, .codex/rules/default.rules

Settings.json Configuration

{
  "effortLevel": "high",
  "model": "claude-opus-4-6[1m]",
  "env": {
    "CLAUDE_CODE_MAX_OUTPUT_TOKENS": "64000",
    "CLAUDE_STATUSLINE_MODE": "dark",
    "CLAUDE_STATUSLINE_THEME": "catppuccin",
    "MAX_MCP_OUTPUT_TOKENS": "64000"
  },
  "permissions": {
    "allow": ["Bash(cat:*)", "Bash(grep:*)", "Bash(ls:*)", "Bash(mkdir:*)", "Bash(sort:*)", "Skill(kk:*)", "Read(~/.claude/plugins/cache/claude-toolbox/*)", "WebSearch"],
    "deny": ["Bash(rm:*)", "Read(./**/.env)", "Read(./**/*.lock*)", "Read(.git/)", ...]
  },
  "statusLine": {
    "command": "bash $CLAUDE_PROJECT_DIR/.claude/toolbox/scripts/statusline_enhanced.sh",
    "type": "command"
  }
}

Notable: effortLevel: "high" (not "max" — careful budget control), claude-opus-4-6[1m] 1M context window, explicit Bash(rm:*) deny rule.

Components

serpro69/claude-toolbox — Components

Directory Tree

.
├── .claude/                          # Claude Code config (project-level)
│   ├── settings.json                 # Core settings (model, effort, permissions, statusline, plugin)
│   ├── settings.local.json           # Machine-local overrides (not committed)
│   ├── CLAUDE.extra.md              # Behavioral instructions (independent thinking, fail-loud, deferred work protocol)
│   ├── scripts/
│   │   └── capy.sh                  # Capy MCP launch script
│   └── toolbox/
│       └── scripts/
│           ├── statusline_enhanced.sh  # Rich statusline for Catppuccin theme
│           └── template-sync.sh        # Upstream template sync
├── klaude-plugin/                    # Canonical plugin source (Claude Code)
│   ├── agents/
│   │   ├── code-reviewer.md
│   │   ├── design-reviewer.md
│   │   ├── eval-grader.md
│   │   ├── profile-resolver.md
│   │   └── spec-reviewer.md
│   ├── commands/
│   │   ├── chain-of-verification/   # default + isolated
│   │   ├── migrate-from-taskmaster/
│   │   ├── review-code/             # default + isolated
│   │   ├── review-spec/             # default + isolated
│   │   └── template/
│   ├── hooks/
│   │   └── hooks.json               # PreToolUse Bash validator
│   ├── profiles/
│   │   ├── go/, java/, js_ts/, kotlin/, k8s/, k8s-operator/, python/, skill-md/
│   ├── scripts/
│   │   └── validate-bash.sh         # Bash pre-validation hook script
│   └── skills/
│       ├── _shared/                 # DRY shared instructions (symlinked into consuming skills)
│       ├── chain-of-verification/
│       ├── dependency-handling/
│       ├── design/                  # SKILL.md, idea-process.md, refinement-criteria.md, frameworks.md, evals/, shared-capy-knowledge-protocol.md, shared-profile-detection.md
│       ├── diff-skill/
│       ├── document/
│       ├── implement/
│       ├── merge-docs/
│       ├── review-code/
│       ├── review-design/
│       ├── review-spec/
│       └── test/
├── kodex-plugin/                    # Auto-generated Codex output (from klaude-plugin/)
├── .codex/                          # Codex config
│   ├── config.toml
│   ├── hooks.json
│   ├── rules/default.rules
│   ├── scripts/
│   └── agents/*.toml                # Auto-generated from klaude-plugin/agents/*.md
├── .agents/                         # Project-level agent definitions
├── .capy.toml                       # Capy config: store.path = ".capy/knowledge.db"
├── .capy/                           # Knowledge store
│   └── knowledge.db                 # SQLite (BM25 indexed)
├── .mcp.json                        # MCP server wiring for Capy
├── cmd/
│   └── generate-kodex/              # Go tool: klaude-plugin → kodex-plugin transformer
├── test/
│   ├── helpers.sh
│   ├── test-plugin-structure.sh
│   ├── test-codex-structure.sh
│   └── test-template-sync.sh
├── docs/wip/                        # Feature design docs + task files (generated by /kk:design)
├── examples/                        # Real workflow execution examples
├── go.mod, go.sum                   # Go module (generate-kodex tool)
├── requirements.txt                 # Python dependencies (Capy)
├── Makefile                         # Targets: generate-kodex, sync-template, test
├── CLAUDE.md                        # Repo guidance for Claude
├── AGENTS.md                        # Repo guidance for Codex
└── mkdocs.yml                       # Documentation site (serpro69.github.io/claude-toolbox/)

Component Deep-Dives

klaude-plugin/skills/design/ — Most Elaborate Skill

Contains:

SKILL.md — main skill definition
idea-process.md — step-by-step idea intake workflow
existing-task-process.md — variant for pre-existing tasks
refinement-criteria.md — how to evaluate and refine ideas
frameworks.md — design framework references
evals/ — evaluation rubrics
shared-capy-knowledge-protocol.md — symlink to shared Capy integration protocol
shared-profile-detection.md — symlink to shared profile detection instructions
example-tasks.md — example task format

Capy MCP Server

Capy provides context-window-aware knowledge retrieval:

BM25 search across the knowledge database
SQLite persistence at .capy/knowledge.db — survives across sessions
Sandboxed subprocess — blocks curl/wget (prevents exfiltration)
capy_fetch_and_index — fetch a URL and index its content into knowledge.db
capy_search — BM25 search over indexed knowledge
capy_execute / capy_batch_execute — execute queries against the store

The knowledge base accumulates findings, decisions, and conventions that skills write during execution. Later invocations can query prior context via capy_search.

validate-bash.sh

PreToolUse hook that validates Bash commands before execution. Similar in intent to zbruhnke-cc-starter's validate-bash.sh but delivered via plugin hooks rather than project-level hooks.json.

cmd/generate-kodex/

Go binary that transforms klaude-plugin/ content into Codex-compatible form:

Resolves ${CLAUDE_PLUGIN_ROOT} variable (Claude Code-specific)
Rewrites /kk: prefix → $kk: (Codex convention)
Converts .md agent files to .toml format
Injects Codex-required headers

CI check: make generate-kodex && git diff --exit-code kodex-plugin/ .codex/agents/ fails if generated output is out of sync.

Catppuccin Statusline

statusline_enhanced.sh renders a rich terminal statusline with Catppuccin theme. Configured via:

CLAUDE_STATUSLINE_THEME: "catppuccin"
CLAUDE_STATUSLINE_MODE: "dark"

The statusline is invoked on every Claude Code prompt via settings.json > statusLine.command.

Prompts

serpro69/claude-toolbox — Prompts

CLAUDE.extra.md — Behavioral Instruction Excerpt

## Behavioral Instructions

### Independent Thinking

When discussing decisions, designs, trade-offs, or approaches:

- **Be direct.** If the user is wrong, say "no, that's wrong" and explain why. Don't soften with "have you
  considered" when you mean "that won't work."
- **Push back with reasoning.** Challenge assumptions, play devil's advocate, name blind spots. Give genuine
  opinions — don't default to agreement.
- **Call out patterns.** If the user is spiraling, overthinking, making excuses, or avoiding discomfort, name
  it directly and explain the cost.
- **Authenticity over contrarianism.** When you genuinely agree, just agree. The goal is honest signal, not
  reflexive disagreement.

### Fail Loud

- **State assumptions explicitly.** If uncertain, ask. Don't guess silently.
- **Fail loud.** Flag errors explicitly. No softening, no silent corrections, no swallowed exceptions, no
  assertions you quietly relax to make a test pass.
- **Pre-existing dead code is not yours to delete.** If you notice unrelated dead code, mention it — don't
  remove it. Only remove orphans that _your_ changes made unused.

Technique: Hard behavioral contracts in CLAUDE.extra.md — not soft suggestions. The ## Exploration Phase section adds an explicit directive: "Always explore on your own to gain complete understanding. Only delegate to exploration agents if the user explicitly requests it." The comment explains why: Claude tends to spawn exploration agents and then re-read the same files, doubling token consumption.

Skill Description Budget Protocol (CLAUDE.md excerpt)

### Skill description budget

Claude Code loads skill descriptions into context so the model can pick the right skill. Two caps apply:

- **Per-entry cap: 1,536 characters.** Each skill's `description` + `when_to_use` combined text is truncated
  at 1,536 characters regardless of the global budget.
- **Global context budget.** Scales dynamically at 1% of the context window, with a fallback of 8,000
  characters.

OpenCode's documented limit for the same field is 1024 characters. For portability across both harnesses,
treat 1,024 as a soft budget for skills that must work on both.

Architecture insight: The CLAUDE.md documents the platform constraint (1,536 char per-skill cap) and explicitly notes the cross-platform constraint (1,024 for OpenCode portability). This level of platform awareness in the skill-authoring guide is unique in the batch — most frameworks don't document skill description budgets at all.

Skill Naming Conventions (CLAUDE.md excerpt)

### Skills

- **Imperative verbs over noun phrases.** `/kk:design` not `/kk:analysis-process`, `/kk:implement` not
  `/kk:implementation-process`. Drop filler suffixes like `-process`. Skills are invoked as `/skill-name`
  — shorter names are faster to type.
- **Family prefixes for grouped skills.** When multiple skills do the same action on different targets, share
  a prefix: `/kk:review-design`, `/kk:review-spec`, `/kk:review-code`.
- **Always use `/kk:` prefix.** The codex generation tool rewrites `/kk:` → `$kk:` for Codex output.
  Exception: `kk:review-findings`, `kk:lang-idioms`, etc. are capy knowledge-store labels, not skill
  references.

Architecture technique: The /kk: prefix is not just style — it's a machine-readable marker that the Go codegen tool uses to transform Claude Code skills into Codex-compatible skills.

Shared Instruction Pattern — Capy Knowledge Protocol (design skill excerpt)

From klaude-plugin/skills/design/shared-capy-knowledge-protocol.md (symlink to _shared/):

The design skill integrates Capy search at the beginning of the idea workflow — querying knowledge.db for prior design decisions, established conventions, and lessons from previous features before starting the current design. This prevents re-solving problems the project has already solved.

Architecture technique: Per-skill symlinks (shared-<name>.md → ../_shared/<name>.md) allow shared instructions to be referenced via local-relative paths within skills, which keeps links working when skills are bundled or copied — no ../ path traversal required in the skill's own Markdown.

Uniqueness

serpro69/claude-toolbox — Uniqueness Assessment

Tier Assessment: A (Genuinely Novel)

Top 5 Unique Traits

1. Cross-Provider Code Generation via Go Tooling

klaude-plugin/ is the canonical source for both Claude Code and Codex. A Go binary (cmd/generate-kodex/) transforms the Claude Code plugin format into Codex-compatible output:

${CLAUDE_PLUGIN_ROOT} variable resolved
/kk: skill prefix rewritten to $kk:
Agent .md files converted to .toml
CI enforces freshness: make generate-kodex && git diff --exit-code

No other framework in the batch or seeds maintains a single-source plugin that generates two different provider formats. This is a genuine DRY solution to the multi-provider problem — not "here are two parallel directories" but "one source, one generator, two outputs."

2. Language-Specific Profile System with Dynamic Detection

8 language profiles (go, java, js_ts, kotlin, k8s, k8s-operator, python, skill-md) with distinct instructions for implementation, testing, and code review. A dedicated profile-resolver agent detects the project's language at skill invocation time and injects the appropriate profile.

This is the most explicit language-specificity in the batch. Most frameworks provide generic instructions with brief language notes. serpro69 treats each language as a first-class citizen with its own review criteria.

3. Isolated Code Review via Command Variants (`default.md` / `isolated.md`)

The klaude-plugin/commands/<skill>/isolated.md pattern encodes bias isolation as an architectural choice: an independent code-reviewer sub-agent reviews code with zero authorship context. The /kk:review-code:isolated vs /kk:review-code distinction is the cleanest formalization of the "no-authorship-bias" review principle seen in the entire batch.

The isolated mode also supports routing to external Gemini models — making this a true multi-model review pipeline.

4. ELv2 License

The only ELv2-licensed framework in the batch (and in all 11 seeds). ELv2 prohibits using the toolkit as the basis for a hosted commercial SaaS product. This is an intentional statement: the author built this for development workflows, not to be resold as an AI developer platform.

5. Opinionated Anti-Sycophancy Behavioral Contract

CLAUDE.extra.md explicitly contracts: "If the user is wrong, say 'no, that's wrong' and explain why. Don't soften with 'have you considered' when you mean 'that won't work.'" Most frameworks use soft behavioral suggestions. serpro69's CLAUDE.extra.md reads like a professional engagement contract — no apologies, no softening, challenge wrong assumptions directly.

The Exploration Phase directive ("Always explore on your own to gain complete understanding. Only delegate to exploration agents if the user explicitly requests it") is uniquely motivated by measured economics: it prevents Claude from spawning exploration agents and then re-reading the same files, doubling token consumption.

Secondary Differentiators

Catppuccin statusline with dark mode — the most aesthetically deliberate configuration in the batch
skill description budget documented in CLAUDE.md — explicit per-skill 1,536-char cap + 1,024-char OpenCode portability guidance (probably the most technically detailed skill-authoring guide in the batch)
Shared instruction symlinks (shared-<name>.md → ../_shared/<name>.md) — clever bundling-safe indirection for DRY skill instructions
Template sync infrastructure (template-sync.sh) — keeps downstream projects updated without a full fork-rebase cycle
Bash(rm:*) in deny permissions — the explicit block on rm as a default deny rule is unusually cautious
mkdocs full documentation site at serpro69.github.io/claude-toolbox/ — professional-grade external docs for a 142-star project

What This Framework Gets Right That Others Don't

The pipeline is a loop, not a DAG. Implement calls review, which calls test, which calls document — skills compose recursively within the pipeline. Other frameworks list skills as independent primitives; serpro69's skills reference each other explicitly.
Knowledge accumulates to the project, not the session. Capy's SQLite is committed to the repo. The knowledge base grows with the project. This means a new developer onboarding to a project immediately has access to prior design decisions — not just code history.
Multi-model review is architecturally embedded (not a one-off trick). The code-reviewer agent's ability to route to Gemini is part of the review design, not a workaround.

Limitations

ELv2 is a real barrier for teams wanting to build commercial products on top
Capy knowledge base management — BM25 without decay means knowledge.db grows unbounded; no pruning strategy
11 skills is a curated minimum — teams with different workflows (e.g., no formal design phase, or async code review) may find the pipeline over-prescribed
Go toolchain dependency for cross-provider generation — adds friction for non-Go teams contributing to the project itself

Workflow

serpro69/claude-toolbox — Workflow

Core Development Pipeline

/kk:design → /kk:review-design → /kk:implement → /kk:review-code → /kk:test → /kk:document

This is the primary workflow. Each step has checkpoints and handoffs to the next.

Step-by-Step Walkthrough

Step 1: Design (`/kk:design`)

User invokes /kk:design and describes a feature idea
Claude asks refinement questions one at a time (not a batch questionnaire)
Once sufficiently refined, produces:
- docs/wip/<feature>/design.md — design document
- docs/wip/<feature>/implementation.md — implementation plan
- docs/wip/<feature>/tasks.md — task list with checkboxes, H2 headings per task, bold status/dependencies
Profile detection: profile-resolver agent detects the project language and injects the appropriate profile (go, java, js_ts, kotlin, k8s, python, etc.)
Capy knowledge query: design skill queries knowledge.db for relevant prior decisions

Step 2: Review Design (`/kk:review-design <feature>`)

Reads docs/wip/<feature>/design.md
design-reviewer agent checks for:
- Gaps in assumptions
- Missing edge cases
- Interface inconsistencies
- Security considerations
Findings go to docs/wip/<feature>/design-review.md
Any issues not fixed immediately are documented as TODO/FIXME per the deferred-work protocol

Step 3: Implement (`/kk:implement`)

Reads docs/wip/<feature>/tasks.md
Executes tasks in order, updating checkbox state
After each batch of tasks, automatically runs:
- /kk:review-code checkpoint
- Test verification checkpoint
At end of all tasks: /kk:test, then /kk:document

Step 4: Code Review (`/kk:review-code` or `/kk:review-code:isolated`)

Standard mode: Claude reviews its own code for SOLID violations, security risks, quality issues.

Isolated mode (/kk:review-code:isolated):

Spawns independent code-reviewer sub-agent with no access to authoring context
Sub-agent reviews cold — eliminates authorship bias
Optionally routes to external Gemini model for third-party perspective
Findings recorded; issues without immediate fixes get explicit TODO entries

Step 5: Test (`/kk:test`)

Runs test suites, verifies coverage. Language profile determines test framework and conventions.

Step 6: Document (`/kk:document`)

Generates or updates documentation. Writes to docs/ per language profile conventions.

Auxiliary Skills

Skill	Typical Use
`/kk:review-spec`	Pre-implementation spec review (analogous to review-design but for specs)
`/kk:chain-of-verification`	Multi-step verification for complex claims or changes
`/kk:dependency-handling`	Evaluating dependency upgrades, security vetting
`/kk:diff-skill`	Comparing two skill versions
`/kk:merge-docs`	Combining documentation fragments

Task File Format

## Task 1: <name>

**Status:** todo | in-progress | done
**Dependencies:** Task N

- [ ] Subtask A
- [ ] Subtask B

The /kk:implement skill reads and updates these files live — checkboxes get checked as tasks complete.

Knowledge Persistence (Capy)

Throughout the pipeline, skills write findings, decisions, and conventions to knowledge.db via Capy:

Design decisions recorded during /kk:design
Review findings recorded during /kk:review-code
Conventions from language profiles stored and searchable
Future sessions can query prior context via capy_search

This creates a project-scoped memory that persists beyond the session window.

Deferred Work Protocol

Per CLAUDE.extra.md:

When a fix is deferred, write it down explicitly (inline TODO: / FIXME: in code, entry in docs/wip/<feature>/tasks.md, or inline note in design doc)
"Postponed — trivial" is forbidden; must state: what was deferred, why, and concrete next step
Review outputs that identify unfixed issues must record them durably, not as chat asides

Behavioral Contract (CLAUDE.extra.md)

Be direct: Challenge wrong assumptions; say "no, that's wrong" not "have you considered"
Fail loud: Flag errors explicitly; no silent corrections
Pre-existing dead code: Mention but do not delete; only remove orphans your changes created
Explore first: Claude explores on its own before spawning exploration agents (prevents double token consumption)

Memory Context

serpro69/claude-toolbox — Memory & Context

Memory Architecture

Two-tier hybrid memory:

Capy SQLite knowledge base (knowledge.db) — persistent across sessions, project-scoped, BM25-searchable
docs/wip/<feature>/ — session-durable file-based memory for design docs, task state, review findings

Tier 1: Capy Knowledge Base

Technology

SQLite at .capy/knowledge.db
BM25 full-text search via Capy MCP server
Not vector embeddings — keyword-match search (simpler, deterministic, no embedding API costs)

What Gets Stored

Design decisions (from /kk:design executions)
Code review findings (from /kk:review-code outputs)
Project conventions and patterns
Language-profile-specific notes
Prior decisions under labels like kk:review-findings, kk:lang-idioms

How Skills Access It

Skills use the capy_search MCP tool to query prior context at invocation time. The design skill queries before starting a new design; review-code queries for established code conventions; etc.

Skills write new findings via capy_index or capy_fetch_and_index (for external URLs).

Persistence

The .capy/knowledge.db file is a regular SQLite database that persists indefinitely. It is:

Committed to the project repository (shared across contributors)
Available to all sessions working in the project directory
Not cleared between sessions

Tier 2: docs/wip/ — Feature-Level Memory

Structure

docs/wip/<feature>/
├── design.md           # Design document (written by /kk:design)
├── implementation.md   # Implementation plan
├── tasks.md           # Task list with checkboxes (updated live by /kk:implement)
└── design-review.md   # Review findings (written by /kk:review-design)

Lifecycle

Created by /kk:design at the start of a feature
Updated by /kk:implement as tasks complete (checkboxes)
Updated by review skills with findings
Committed to git — persistent and auditable
Deferred work explicitly tracked here (per CLAUDE.extra.md protocol)

CLAUDE.extra.md Behavioral Instructions

Stored in .claude/CLAUDE.extra.md — behavioral contracts that persist across sessions:

Independent thinking rules
Fail-loud protocol
Deferred work documentation requirements
Exploration-before-delegation directive

These are project-scoped and don't change between sessions.

Context Window Management

model: "claude-opus-4-6[1m]" — 1M context window reduces pressure significantly
CLAUDE_CODE_MAX_OUTPUT_TOKENS: "64000" — explicit max output token budget
MAX_MCP_OUTPUT_TOKENS: "64000" — limits Capy/MCP output token usage
Capy's role is context-window routing: rather than loading all project knowledge into the context window, skills query only what's relevant via BM25 search

Cross-Session Continuity

Mechanism	What Persists
`knowledge.db`	Decisions, findings, conventions — indefinitely
`docs/wip/<feature>/tasks.md`	Task progress, checkboxes state
`docs/wip/<feature>/design.md`	Design rationale, assumptions
Git history	All of the above via commits

When a session resumes mid-feature, /kk:implement reads tasks.md and continues from where it left off. The checkpoint mechanism (code review after each batch) means partial state is always valid.

Contrast with Other Batch Members

Framework	Memory Mechanism
serpro69-claude-toolbox	SQLite BM25 (Capy) + file-based docs/wip/
notque-cc-toolkit	SQLite with confidence decay + pruning (0.05/30 days)
centminmod-cc-setup	Dual-memory: git-shared CLAUDE-.md + machine-local memory/.md
alexeykrol-cc-starter	SNAPSHOT.md session state (ephemeral, per-session)
others	No persistent memory

Capy's approach differs from notque's: notque implements a learning database with confidence scoring that ages and prunes entries. Capy is a pure retrieval store without decay — knowledge accumulates and must be manually managed.

Orchestration

serpro69/claude-toolbox — Orchestration

Orchestration Pattern

Sequential pipeline with isolated sub-agent option. The 11 skills form an ordered pipeline; individual review steps can spawn independent sub-agents for bias-free review.

Primary Orchestration: Skill Pipeline

/kk:design
    → profile-resolver agent (language detection)
    → Capy knowledge query (prior decisions)
    → refinement loop (sequential questions)
    → outputs: docs/wip/<feature>/{design.md, implementation.md, tasks.md}

/kk:review-design
    → design-reviewer agent
    → outputs: docs/wip/<feature>/design-review.md

/kk:implement
    → reads tasks.md
    → execute tasks in batches
    → after each batch: /kk:review-code checkpoint
    → at end: /kk:test + /kk:document

/kk:review-code:isolated
    → spawns code-reviewer sub-agent (no authoring context)
    → optional: route to external Gemini model
    → outputs: review findings

/kk:test → test execution + coverage
/kk:document → documentation generation

Sub-Agent Architecture

Agents as Role-Specific Specialists

Agent	Invocation Pattern	Why Independent
`code-reviewer`	`/kk:review-code:isolated`	No authorship bias — cold review
`design-reviewer`	`/kk:review-design`	Dedicated design analysis context
`spec-reviewer`	`/kk:review-spec`	Spec-specific evaluation frame
`eval-grader`	Evaluation tasks	Scoring with defined rubric
`profile-resolver`	At skill start	Language/framework detection

Isolation via Command Variants

The commands/ structure encodes two modes:

klaude-plugin/commands/<name>/
├── default.md     # Standard — invoked as /kk:<name>:default or /kk:<name>
└── isolated.md    # Sub-agent variant — invoked as /kk:<name>:isolated

Default mode: Claude reviews within the current context (full project awareness, potential authorship bias).

Isolated mode: A fresh code-reviewer sub-agent is spawned with only the code under review — no prior conversation, no authorship context. This is the key anti-bias mechanism.

Multi-Model Support

The code-reviewer agent definition references external Gemini models as an optional third-party reviewer. This adds a different model's perspective alongside Claude's own review.

From settings.json perspective: claude-opus-4-6[1m] is the primary model. Gemini appears as an external model in sub-agent invocations via agent definitions, not as a Claude Code model setting.

Concurrency

Skills within a pipeline step run sequentially. The pipeline itself is sequential (design before implement, etc.). The isolated sub-agent invocations are single agents per invocation — no parallel fan-out within the current implementation.

Capy as Orchestration Memory

The Capy MCP server acts as shared memory across pipeline steps:

/kk:design writes design decisions to knowledge.db
/kk:review-code queries for established conventions before reviewing
/kk:implement can query prior implementation patterns

This creates implicit orchestration continuity without requiring a session-spanning orchestrator.

Codex Orchestration (Generated)

The kodex-plugin/ generates equivalent Codex sub-agent definitions from klaude-plugin/agents/*.md:

Agent files are converted to .toml format
Same skill pipeline available as $kk:<name> in Codex
.codex/hooks.json provides Codex-side hook equivalent

The CI check (make generate-kodex && git diff --exit-code) ensures the two providers are always in sync.

Approval Gates

The validate-bash.sh PreToolUse hook is the primary gate — it validates Bash commands before execution. No interactive approval prompts between pipeline steps; the pipeline runs to completion once started.

Comparison

serpro69/claude-toolbox — Comparison with Seeds

Closest Seed Analogues

superpowers (closest — skill-based behavior)

Dimension	superpowers	serpro69-claude-toolbox
Distribution	Skills-only framework	Plugin (skills + agents + commands + hooks + MCP)
Skill count	~30+	11 (pipeline-focused)
Naming convention	`/skill-name`	`/kk:<name>` (namespaced)
Memory	None built-in	Capy SQLite + docs/wip/
Multi-provider	Claude Code only	Claude Code + Codex (generated)
License	MIT	ELv2 (restrictive)

The /kk: prefix pattern is analogous to superpowers' skill namespace, but serpro69 adds explicit pipeline ordering that superpowers lacks.

ccmemory (memory angle)

Dimension	ccmemory	serpro69-claude-toolbox
Persistence	File-based append	SQLite BM25 + file-based
Search	Sequential scan / grep	BM25 full-text index
Decay	None	None
Cross-session	Yes	Yes
Scope	Global	Project-scoped

ccmemory focuses purely on memory management. Capy is MCP-mediated and BM25-indexed — fundamentally different retrieval mechanism.

spec-driver (structured development flow)

Dimension	spec-driver	serpro69-claude-toolbox
Pipeline structure	Spec → Tasks → Impl	Design → Review → Impl → Review → Test → Doc
Spec format	Structured Markdown spec	design.md + implementation.md + tasks.md
Task tracking	Task list	Checkbox tasks.md
Multi-language	No	Yes (8 profiles)
Memory	None	Capy knowledge.db

Both implement a structured code-development pipeline. serpro69 adds language profiles, persistent knowledge, and multi-model review that spec-driver lacks.

Unique Capabilities vs All Seeds

1. Cross-Provider Code Generation (No seed has this)

The Go tool cmd/generate-kodex/ transforms Claude Code plugin content → Codex format. Seeds either target Claude Code or Codex, never both with a synchronized generated output. The /kk: → $kk: rewrite plus TOML agent conversion is a unique mechanical bridge.

2. Language-Specific Profiles (Not seen in seeds)

8 distinct language profiles (go, java, js_ts, kotlin, k8s, k8s-operator, python) with dedicated review and implementation instructions. Seeds either ignore language differences or include generic guidance. Profile-resolver agent enables dynamic injection.

3. ELv2 License (Unique across all 10 batch + all seeds)

No seed or batch peer uses ELv2. This creates a real restriction: the toolkit cannot be offered as a commercial SaaS product by third parties.

4. `/kk:review-code:isolated` (New isolation pattern)

Command variants encoding standard vs. isolated sub-agent review in the default.md / isolated.md file naming scheme is a novel convention. Seeds have sub-agents but not this explicit bias-isolation variant pattern.

Batch Peers Comparison

vs. notque-cc-toolkit (most complex batch member)

Dimension	notque	serpro69
Skills	50+	11
Agents	44	5
Hooks	77 Python	1 (validate-bash.sh)
Memory	SQLite with confidence decay + pruning	SQLite BM25 (no decay)
Pipeline	Hierarchical orchestrator	Sequential pipeline
MCP	None	Capy (local)
License	MIT	ELv2

notque is larger and more automated; serpro69 is more opinionated and curated.

vs. centminmod-cc-setup (memory angle)

centminmod's dual-memory uses CLAUDE-.md files (git-shared) + machine-local memory/.md. serpro69 uses SQLite (BM25-searchable, shared via repo commit). Different tradeoffs: CLAUDE.md memory is human-readable; SQLite is machine-queryable.

vs. alex-feel-cc-toolbox (installation paradigm)

alex-feel is a meta-installer (brings nothing, installs anything). serpro69 ships concrete content (11 skills, 5 agents) but also has a sync infrastructure. They represent opposite ends: total declarative flexibility vs. curated opinionated content.

Philosophy

serpro69 is explicitly "minimal by design" (README description prefix) and "battle-tested daily." The anti-AI-slop stance ("Purely AI-made skills are hot garbage") reflects a curation philosophy that none of the seeds articulate this bluntly. The toolkit is the author's personal production toolchain made distributable — not a demonstration framework.

Related frameworks

same archetype · same primary tool · same memory type

CodeMachine CLI ★ 2.5k

A16 Cross-vendor router

JavaScript-DSL workflow orchestration engine that captures repeatable AI coding agent workflows with tracks, condition groups,…

Codexia ★ 690

A16 Cross-vendor router

Tauri desktop app providing visual control plane, task scheduler, git worktree manager, and headless REST API for Codex CLI +…

Kagan ★ 88

A16 Cross-vendor router

Kanban TUI for AI coding agents with a structurally enforced human review gate (REVIEW → DONE cannot be automated) — one git…

oh-my-claudecode (Yeachan-Heo) ★ 35k

A16 Cross-vendor router

Zero-learning-curve teams-first multi-agent orchestration for Claude Code with autopilot (6-phase lifecycle), ralph (PRD-driven…

Paseo ★ 6.8k

A16 Cross-vendor router

Multi-provider AI coding agent orchestration daemon with cross-device access (phone/desktop/CLI) and git worktree isolation.

CCG Workflow ★ 5.4k

A16 Cross-vendor router

Routes Claude + Codex + Gemini to task-appropriate collaboration strategies (direct-fix through full-collaborate) with hook-based…

Distribution

Type: plugin-marketplace
License: ELv2
Install: moderate
Version: unknown (last pushed 2026-05-26)

Surfaces

CLI binary: No
CLI subcmds: 0
Local UI: No

Components

Commands: 5
Skills: 11
Subagents: 5
Hooks: 1
MCP servers: 1
MCP tools: 5
Scripts: 4
Templates: 0

Workflow

Phases: 6
Approval gates: 1
Spec format: markdown
Spec storage: docs/wip/<feature>/{design.md, implementation.md, tasks.md}
Delta or full: whole-file

Orchestration

Multi-agent: Yes
Pattern: sequential-pipeline-with-isolated-subagents
Max concurrent: 1
Isolation: isolated-command-variant (default.md vs isolated.md)
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text

Execution

Mode: interactive-pipeline
Crash recovery: partial (tasks.md checkpoint state)
Compaction: No
Session handoff: yes (knowledge.db + tasks.md)
Streaming: No

Memory

Type: sqlite-bm25 + file-based
Persistence: project-scoped
Search: bm25
State files: 3 files

Quality

TDD: No
TDD mechanism: none
Validators: 2
Self-review: isolated sub-agent variant for review-code

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: No
Audit format: none
Replay: partial (tasks.md checkboxes represent progress)

Tools

Primary: claude-code
Targets: 2
Portability: medium (klaude-plugin → kodex-plugin generation)

Signals

Stars: 142
Last commit: 2026-05-26
Contributors: 4
Maintainer: active
Quality score: 2.9/10

Summary

serpro69/claude-toolbox — Summary

One-Line Description

What This Is

Scale

Target Audience

Installation

Key Differentiators

Distribution

serpro69/claude-toolbox — Distribution & Installation

Distribution Type

Install Methods

Path 1: Template (new projects)

Path 2: Adopt into existing project

Path 3: Plugin-only

Config Files

Runtime Requirements

Template Sync Infrastructure

Makefile Targets

License Implications

Primitives

serpro69/claude-toolbox — Primitives

Claude Code Primitives Used

Skills (11 — via klaude-plugin/)

Commands (5 directories in klaude-plugin/commands/)

Agents (5)

Hooks (1 event, 1 rule)

MCP Servers (1)

Language Profiles (8)

Shared Instruction Pattern

Codex Primitives (Generated)

Settings.json Configuration

Components

serpro69/claude-toolbox — Components

Directory Tree

Component Deep-Dives

klaude-plugin/skills/design/ — Most Elaborate Skill

Capy MCP Server

validate-bash.sh

cmd/generate-kodex/

Catppuccin Statusline

Prompts

serpro69/claude-toolbox — Prompts

CLAUDE.extra.md — Behavioral Instruction Excerpt

Skill Description Budget Protocol (CLAUDE.md excerpt)

Skill Naming Conventions (CLAUDE.md excerpt)

Shared Instruction Pattern — Capy Knowledge Protocol (design skill excerpt)

Uniqueness

serpro69/claude-toolbox — Uniqueness Assessment

Tier Assessment: A (Genuinely Novel)

Top 5 Unique Traits

1. Cross-Provider Code Generation via Go Tooling

2. Language-Specific Profile System with Dynamic Detection

3. Isolated Code Review via Command Variants (default.md / isolated.md)

4. ELv2 License

5. Opinionated Anti-Sycophancy Behavioral Contract

Secondary Differentiators

What This Framework Gets Right That Others Don't

Limitations

Workflow

serpro69/claude-toolbox — Workflow

Core Development Pipeline

Step-by-Step Walkthrough

Step 1: Design (/kk:design)

Step 2: Review Design (/kk:review-design <feature>)

Step 3: Implement (/kk:implement)

Step 4: Code Review (/kk:review-code or /kk:review-code:isolated)

Step 5: Test (/kk:test)

Step 6: Document (/kk:document)

Auxiliary Skills

Task File Format

Knowledge Persistence (Capy)

Deferred Work Protocol

Behavioral Contract (CLAUDE.extra.md)

Memory Context

serpro69/claude-toolbox — Memory & Context

Memory Architecture

Tier 1: Capy Knowledge Base

Technology

What Gets Stored

3. Isolated Code Review via Command Variants (`default.md` / `isolated.md`)

Step 1: Design (`/kk:design`)

Step 2: Review Design (`/kk:review-design <feature>`)

Step 3: Implement (`/kk:implement`)

Step 4: Code Review (`/kk:review-code` or `/kk:review-code:isolated`)

Step 5: Test (`/kk:test`)

Step 6: Document (`/kk:document`)

4. `/kk:review-code:isolated` (New isolation pattern)