Skip to content
/

serpro69/claude-toolbox

serpro69-claude-toolbox · serpro69/claude-toolbox · ★ 142 · last commit 2026-05-26

Structured 11-skill development pipeline (design→review→implement→review→test→doc) with persistent Capy BM25 knowledge base, language-specific profiles for 8 languages, and Go-based cross-provider codegen targeting both Claude Code and Codex from a single canonical source.

Best whenAI skills should be minimal, curated, and battle-tested — not AI-generated slop. Cross-provider support requires code generation, not copy-paste. Isolated su…
Skip ifAI-generated skill content without human review, Spawning exploration agents before exploring yourself (doubles token consumption)
vs seeds
superpowers' none…
Primitive shape 27 total
Commands 5 Skills 11 Subagents 5 Hooks 1 MCP tools 5
00

Summary

serpro69/claude-toolbox — Summary

One-Line Description

ELv2-licensed, battle-tested development workflow toolkit for Claude Code and OpenAI Codex: 11 pipeline skills, Capy MCP context router, multi-model review via sub-agents, and language-specific profiles — distributed as an installable plugin with Go-based cross-provider code generation.

What This Is

claude-toolbox is a structured development workflow framework built around an 11-skill pipeline covering the full cycle from idea to documentation. The canonical content lives in klaude-plugin/ (for Claude Code) and is auto-generated into kodex-plugin/ (for Codex) by a Go tool (cmd/generate-kodex/). It is NOT a template to fork — it distributes via Claude Code's plugin marketplace system, pointing to the author's own GitHub repository as a custom marketplace.

The core loop is: /kk:design → /kk:review-design → /kk:implement → /kk:review-code → /kk:test → /kk:document

Scale

  • Stars: 142 (2026-05-26)
  • License: ELv2 (Elastic License v2)
  • Skills: 11 (design, review-design, implement, review-code, review-spec, test, document, diff-skill, merge-docs, chain-of-verification, dependency-handling)
  • Agents: 5 (code-reviewer, design-reviewer, eval-grader, profile-resolver, spec-reviewer)
  • Commands: 5 directories (chain-of-verification, migrate-from-taskmaster, review-code, review-spec, template)
  • Hooks: 1 PreToolUse Bash validator
  • MCP Servers: 1 (Capy — context-window routing with BM25 + SQLite knowledge.db)
  • Language profiles: 8 (go, java, js_ts, kotlin, k8s, k8s-operator, python, skill-md)

Target Audience

Experienced developers who want structured, repeatable agentic development workflows across multiple languages and multiple projects — with persistent context via a knowledge base and multi-model code review. Not a beginner-friendly quickstart; more like professional tooling for power users.

Installation

  1. Add to an existing project (adopting) or use as a repository template
  2. Enable via settings.json: "enabledPlugins": {"kk@claude-toolbox": true}
  3. Point at custom marketplace: "extraKnownMarketplaces": {"claude-toolbox": {"source": {"source": "github", "repo": "serpro69/claude-toolbox"}}}

Skills are then available as /kk:<name>. Three setup paths: template, adopt-existing, plugin-only.

Key Differentiators

  1. Cross-provider code generation: klaude-plugin/ is the canonical source; make generate-kodex produces Codex-compatible output (resolves ${CLAUDE_PLUGIN_ROOT}, converts agents to TOML). CI checks freshness.
  2. Capy MCP router: Local SQLite knowledge base (knowledge.db) with BM25 search for context-window-aware retrieval. Blocks curl/wget to prevent exfiltration. Sandboxed subprocess.
  3. Language-specific profiles: 8 distinct profile directories providing per-language implementation, testing, and review guidance — invoked dynamically via profile-resolver agent.
  4. Multi-model code review: /kk:review-code:isolated spawns independent sub-agent reviewers with no authorship bias; supports routing to external Gemini models.
  5. Non-permissive license: ELv2 restricts commercial SaaS use — unique among batch peers.
  6. Opinionated behavioral CLAUDE.extra.md: Direct/challenging interaction style, explicit assumption-stating, document-deferred-work protocol — not soft guardrails but hard behavioral contracts.
01

Distribution

serpro69/claude-toolbox — Distribution & Installation

Distribution Type

Plugin-based (Claude Code plugin marketplace + GitHub repository template). Not a static fork-and-use scaffold; distributes as a live plugin that can be updated independently of the user's project.

Install Methods

Path 1: Template (new projects)

Use the GitHub template to create a new repository — gets the full configuration including .claude/, klaude-plugin/, template sync infrastructure, and Makefile targets.

Path 2: Adopt into existing project

Follow the adopting guide — copies .claude/ settings, hooks, and scripts; wires up the plugin; sets up template sync to pull future updates.

Path 3: Plugin-only

Add to settings.json:

{
  "enabledPlugins": {
    "kk@claude-toolbox": true
  },
  "extraKnownMarketplaces": {
    "claude-toolbox": {
      "source": {
        "source": "github",
        "repo": "serpro69/claude-toolbox"
      }
    }
  }
}

Config Files

File Purpose
.claude/settings.json Core settings: effortLevel, model, permissions, statusline, plugin enables, marketplace
.claude/settings.local.json Machine-local overrides (not committed)
.capy.toml Capy MCP config — store.path = ".capy/knowledge.db"
.mcp.json MCP server wiring — points to capy.sh serve
klaude-plugin/hooks/hooks.json Hook definitions shipped with the plugin

Runtime Requirements

  • Claude Code (primary target)
  • OpenAI Codex (secondary; auto-generated kodex-plugin/)
  • Go 1.x (for make generate-kodex — developer workflow only)
  • Bash (hook scripts, statusline)
  • Python (requirements.txt — capy dependencies)

Template Sync Infrastructure

The repo ships template-sync.sh — a mechanism to pull future toolbox updates into downstream projects. The sync script preserves project-specific customizations while applying upstream changes. Running make sync-template or equivalent keeps the .claude/ config fresh.

Makefile Targets

Target Purpose
make generate-kodex Rebuild kodex-plugin/ and .codex/agents/ from klaude-plugin/ source
make sync-template Pull toolbox updates into downstream project
for test in test/test-*.sh; do $test; done Run test suites

License Implications

ELv2 (Elastic License v2): free to use for internal/personal projects, cannot be offered as a hosted SaaS service to others. This is the most restrictive license in the batch — all other frameworks use MIT/Apache/CC0/unlicensed.

02

Primitives

serpro69/claude-toolbox — Primitives

Claude Code Primitives Used

Skills (11 — via klaude-plugin/)

All prefixed /kk: — invoked as slash skills:

Skill Purpose
/kk:design Idea → refinement questions → design docs + task list in docs/wip/
/kk:review-design Design gap analysis before implementation
/kk:implement Execute task list with code review checkpoints between batches
/kk:review-code SOLID violations, security, quality; :isolated variant uses sub-agent reviewers
/kk:review-spec Spec quality review
/kk:test Test coverage verification
/kk:document Documentation generation
/kk:diff-skill Diff-based skill comparison
/kk:merge-docs Document merge/consolidation
/kk:chain-of-verification Multi-step verification pipeline
/kk:dependency-handling Dependency update and vetting workflow

Skills are organized in klaude-plugin/skills/<name>/SKILL.md with supporting files per skill.

Commands (5 directories in klaude-plugin/commands/)

Command Variants
chain-of-verification default + isolated
migrate-from-taskmaster (migration utility)
review-code default + isolated
review-spec default + isolated
template (skill authoring template)

Agents (5)

Agent Role
code-reviewer.md Independent code review sub-agent
design-reviewer.md Design document review sub-agent
eval-grader.md Evaluation scoring sub-agent
profile-resolver.md Language/framework profile detection and injection
spec-reviewer.md Specification review sub-agent

Hooks (1 event, 1 rule)

Event Matcher Type Script
PreToolUse Bash command validate-bash.sh

Pre-validates all Bash tool calls before execution. Analogous to zbruhnke-cc-starter's validate-bash.sh but shipped via plugin hooks.json rather than project-level hooks.

MCP Servers (1)

Capy — local context-window routing server:

  • Launched as bash .claude/scripts/capy.sh serve
  • SQLite knowledge store at .capy/knowledge.db
  • BM25 full-text search over indexed knowledge
  • Tools: capy_execute, capy_batch_execute, capy_fetch_and_index, capy_index, capy_search
  • Blocks outbound curl/wget (sandboxed subprocess)
  • Config: .capy.toml with store.path = ".capy/knowledge.db"

Language Profiles (8)

Stored in klaude-plugin/profiles/:

  • go/ — Go-specific implementation + test + review guidance
  • java/ — Java profile
  • js_ts/ — JavaScript/TypeScript profile
  • kotlin/ — Kotlin profile
  • k8s/ — Kubernetes profile
  • k8s-operator/ — Kubernetes operator pattern profile
  • python/ — Python profile
  • skill-md/ — Skill file authoring template

Profiles are selected dynamically via the profile-resolver agent which detects the project language and injects the appropriate profile.

Shared Instruction Pattern

klaude-plugin/skills/_shared/ contains shared instruction files referenced via per-skill symlinks (shared-<name>.md../_shared/<name>.md). This prevents path traversal in bundled skills while keeping instructions DRY.

Codex Primitives (Generated)

kodex-plugin/ — auto-generated from klaude-plugin/ via cmd/generate-kodex/:

  • Resolves ${CLAUDE_PLUGIN_ROOT} variable references
  • Injects required headers
  • Parallel structure: same skills, same agents (converted to TOML at .codex/agents/)
  • Hand-authored Codex files (not generated): .codex/config.toml, .codex/hooks.json, .codex/rules/default.rules

Settings.json Configuration

{
  "effortLevel": "high",
  "model": "claude-opus-4-6[1m]",
  "env": {
    "CLAUDE_CODE_MAX_OUTPUT_TOKENS": "64000",
    "CLAUDE_STATUSLINE_MODE": "dark",
    "CLAUDE_STATUSLINE_THEME": "catppuccin",
    "MAX_MCP_OUTPUT_TOKENS": "64000"
  },
  "permissions": {
    "allow": ["Bash(cat:*)", "Bash(grep:*)", "Bash(ls:*)", "Bash(mkdir:*)", "Bash(sort:*)", "Skill(kk:*)", "Read(~/.claude/plugins/cache/claude-toolbox/*)", "WebSearch"],
    "deny": ["Bash(rm:*)", "Read(./**/.env)", "Read(./**/*.lock*)", "Read(.git/)", ...]
  },
  "statusLine": {
    "command": "bash $CLAUDE_PROJECT_DIR/.claude/toolbox/scripts/statusline_enhanced.sh",
    "type": "command"
  }
}

Notable: effortLevel: "high" (not "max" — careful budget control), claude-opus-4-6[1m] 1M context window, explicit Bash(rm:*) deny rule.

03

Components

serpro69/claude-toolbox — Components

Directory Tree

.
├── .claude/                          # Claude Code config (project-level)
│   ├── settings.json                 # Core settings (model, effort, permissions, statusline, plugin)
│   ├── settings.local.json           # Machine-local overrides (not committed)
│   ├── CLAUDE.extra.md              # Behavioral instructions (independent thinking, fail-loud, deferred work protocol)
│   ├── scripts/
│   │   └── capy.sh                  # Capy MCP launch script
│   └── toolbox/
│       └── scripts/
│           ├── statusline_enhanced.sh  # Rich statusline for Catppuccin theme
│           └── template-sync.sh        # Upstream template sync
├── klaude-plugin/                    # Canonical plugin source (Claude Code)
│   ├── agents/
│   │   ├── code-reviewer.md
│   │   ├── design-reviewer.md
│   │   ├── eval-grader.md
│   │   ├── profile-resolver.md
│   │   └── spec-reviewer.md
│   ├── commands/
│   │   ├── chain-of-verification/   # default + isolated
│   │   ├── migrate-from-taskmaster/
│   │   ├── review-code/             # default + isolated
│   │   ├── review-spec/             # default + isolated
│   │   └── template/
│   ├── hooks/
│   │   └── hooks.json               # PreToolUse Bash validator
│   ├── profiles/
│   │   ├── go/, java/, js_ts/, kotlin/, k8s/, k8s-operator/, python/, skill-md/
│   ├── scripts/
│   │   └── validate-bash.sh         # Bash pre-validation hook script
│   └── skills/
│       ├── _shared/                 # DRY shared instructions (symlinked into consuming skills)
│       ├── chain-of-verification/
│       ├── dependency-handling/
│       ├── design/                  # SKILL.md, idea-process.md, refinement-criteria.md, frameworks.md, evals/, shared-capy-knowledge-protocol.md, shared-profile-detection.md
│       ├── diff-skill/
│       ├── document/
│       ├── implement/
│       ├── merge-docs/
│       ├── review-code/
│       ├── review-design/
│       ├── review-spec/
│       └── test/
├── kodex-plugin/                    # Auto-generated Codex output (from klaude-plugin/)
├── .codex/                          # Codex config
│   ├── config.toml
│   ├── hooks.json
│   ├── rules/default.rules
│   ├── scripts/
│   └── agents/*.toml                # Auto-generated from klaude-plugin/agents/*.md
├── .agents/                         # Project-level agent definitions
├── .capy.toml                       # Capy config: store.path = ".capy/knowledge.db"
├── .capy/                           # Knowledge store
│   └── knowledge.db                 # SQLite (BM25 indexed)
├── .mcp.json                        # MCP server wiring for Capy
├── cmd/
│   └── generate-kodex/              # Go tool: klaude-plugin → kodex-plugin transformer
├── test/
│   ├── helpers.sh
│   ├── test-plugin-structure.sh
│   ├── test-codex-structure.sh
│   └── test-template-sync.sh
├── docs/wip/                        # Feature design docs + task files (generated by /kk:design)
├── examples/                        # Real workflow execution examples
├── go.mod, go.sum                   # Go module (generate-kodex tool)
├── requirements.txt                 # Python dependencies (Capy)
├── Makefile                         # Targets: generate-kodex, sync-template, test
├── CLAUDE.md                        # Repo guidance for Claude
├── AGENTS.md                        # Repo guidance for Codex
└── mkdocs.yml                       # Documentation site (serpro69.github.io/claude-toolbox/)

Component Deep-Dives

klaude-plugin/skills/design/ — Most Elaborate Skill

Contains:

  • SKILL.md — main skill definition
  • idea-process.md — step-by-step idea intake workflow
  • existing-task-process.md — variant for pre-existing tasks
  • refinement-criteria.md — how to evaluate and refine ideas
  • frameworks.md — design framework references
  • evals/ — evaluation rubrics
  • shared-capy-knowledge-protocol.md — symlink to shared Capy integration protocol
  • shared-profile-detection.md — symlink to shared profile detection instructions
  • example-tasks.md — example task format

Capy MCP Server

Capy provides context-window-aware knowledge retrieval:

  • BM25 search across the knowledge database
  • SQLite persistence at .capy/knowledge.db — survives across sessions
  • Sandboxed subprocess — blocks curl/wget (prevents exfiltration)
  • capy_fetch_and_index — fetch a URL and index its content into knowledge.db
  • capy_search — BM25 search over indexed knowledge
  • capy_execute / capy_batch_execute — execute queries against the store

The knowledge base accumulates findings, decisions, and conventions that skills write during execution. Later invocations can query prior context via capy_search.

validate-bash.sh

PreToolUse hook that validates Bash commands before execution. Similar in intent to zbruhnke-cc-starter's validate-bash.sh but delivered via plugin hooks rather than project-level hooks.json.

cmd/generate-kodex/

Go binary that transforms klaude-plugin/ content into Codex-compatible form:

  • Resolves ${CLAUDE_PLUGIN_ROOT} variable (Claude Code-specific)
  • Rewrites /kk: prefix → $kk: (Codex convention)
  • Converts .md agent files to .toml format
  • Injects Codex-required headers

CI check: make generate-kodex && git diff --exit-code kodex-plugin/ .codex/agents/ fails if generated output is out of sync.

Catppuccin Statusline

statusline_enhanced.sh renders a rich terminal statusline with Catppuccin theme. Configured via:

  • CLAUDE_STATUSLINE_THEME: "catppuccin"
  • CLAUDE_STATUSLINE_MODE: "dark"

The statusline is invoked on every Claude Code prompt via settings.json > statusLine.command.

05

Prompts

serpro69/claude-toolbox — Prompts

CLAUDE.extra.md — Behavioral Instruction Excerpt

## Behavioral Instructions

### Independent Thinking

When discussing decisions, designs, trade-offs, or approaches:

- **Be direct.** If the user is wrong, say "no, that's wrong" and explain why. Don't soften with "have you
  considered" when you mean "that won't work."
- **Push back with reasoning.** Challenge assumptions, play devil's advocate, name blind spots. Give genuine
  opinions — don't default to agreement.
- **Call out patterns.** If the user is spiraling, overthinking, making excuses, or avoiding discomfort, name
  it directly and explain the cost.
- **Authenticity over contrarianism.** When you genuinely agree, just agree. The goal is honest signal, not
  reflexive disagreement.

### Fail Loud

- **State assumptions explicitly.** If uncertain, ask. Don't guess silently.
- **Fail loud.** Flag errors explicitly. No softening, no silent corrections, no swallowed exceptions, no
  assertions you quietly relax to make a test pass.
- **Pre-existing dead code is not yours to delete.** If you notice unrelated dead code, mention it — don't
  remove it. Only remove orphans that _your_ changes made unused.

Technique: Hard behavioral contracts in CLAUDE.extra.md — not soft suggestions. The ## Exploration Phase section adds an explicit directive: "Always explore on your own to gain complete understanding. Only delegate to exploration agents if the user explicitly requests it." The comment explains why: Claude tends to spawn exploration agents and then re-read the same files, doubling token consumption.


Skill Description Budget Protocol (CLAUDE.md excerpt)

### Skill description budget

Claude Code loads skill descriptions into context so the model can pick the right skill. Two caps apply:

- **Per-entry cap: 1,536 characters.** Each skill's `description` + `when_to_use` combined text is truncated
  at 1,536 characters regardless of the global budget.
- **Global context budget.** Scales dynamically at 1% of the context window, with a fallback of 8,000
  characters.

OpenCode's documented limit for the same field is 1024 characters. For portability across both harnesses,
treat 1,024 as a soft budget for skills that must work on both.

Architecture insight: The CLAUDE.md documents the platform constraint (1,536 char per-skill cap) and explicitly notes the cross-platform constraint (1,024 for OpenCode portability). This level of platform awareness in the skill-authoring guide is unique in the batch — most frameworks don't document skill description budgets at all.


Skill Naming Conventions (CLAUDE.md excerpt)

### Skills

- **Imperative verbs over noun phrases.** `/kk:design` not `/kk:analysis-process`, `/kk:implement` not
  `/kk:implementation-process`. Drop filler suffixes like `-process`. Skills are invoked as `/skill-name`
  — shorter names are faster to type.
- **Family prefixes for grouped skills.** When multiple skills do the same action on different targets, share
  a prefix: `/kk:review-design`, `/kk:review-spec`, `/kk:review-code`.
- **Always use `/kk:` prefix.** The codex generation tool rewrites `/kk:` → `$kk:` for Codex output.
  Exception: `kk:review-findings`, `kk:lang-idioms`, etc. are capy knowledge-store labels, not skill
  references.

Architecture technique: The /kk: prefix is not just style — it's a machine-readable marker that the Go codegen tool uses to transform Claude Code skills into Codex-compatible skills.


Shared Instruction Pattern — Capy Knowledge Protocol (design skill excerpt)

From klaude-plugin/skills/design/shared-capy-knowledge-protocol.md (symlink to _shared/):

The design skill integrates Capy search at the beginning of the idea workflow — querying knowledge.db for prior design decisions, established conventions, and lessons from previous features before starting the current design. This prevents re-solving problems the project has already solved.

Architecture technique: Per-skill symlinks (shared-<name>.md → ../_shared/<name>.md) allow shared instructions to be referenced via local-relative paths within skills, which keeps links working when skills are bundled or copied — no ../ path traversal required in the skill's own Markdown.

09

Uniqueness

serpro69/claude-toolbox — Uniqueness Assessment

Tier Assessment: A (Genuinely Novel)

Top 5 Unique Traits

1. Cross-Provider Code Generation via Go Tooling

klaude-plugin/ is the canonical source for both Claude Code and Codex. A Go binary (cmd/generate-kodex/) transforms the Claude Code plugin format into Codex-compatible output:

  • ${CLAUDE_PLUGIN_ROOT} variable resolved
  • /kk: skill prefix rewritten to $kk:
  • Agent .md files converted to .toml
  • CI enforces freshness: make generate-kodex && git diff --exit-code

No other framework in the batch or seeds maintains a single-source plugin that generates two different provider formats. This is a genuine DRY solution to the multi-provider problem — not "here are two parallel directories" but "one source, one generator, two outputs."

2. Language-Specific Profile System with Dynamic Detection

8 language profiles (go, java, js_ts, kotlin, k8s, k8s-operator, python, skill-md) with distinct instructions for implementation, testing, and code review. A dedicated profile-resolver agent detects the project's language at skill invocation time and injects the appropriate profile.

This is the most explicit language-specificity in the batch. Most frameworks provide generic instructions with brief language notes. serpro69 treats each language as a first-class citizen with its own review criteria.

3. Isolated Code Review via Command Variants (default.md / isolated.md)

The klaude-plugin/commands/<skill>/isolated.md pattern encodes bias isolation as an architectural choice: an independent code-reviewer sub-agent reviews code with zero authorship context. The /kk:review-code:isolated vs /kk:review-code distinction is the cleanest formalization of the "no-authorship-bias" review principle seen in the entire batch.

The isolated mode also supports routing to external Gemini models — making this a true multi-model review pipeline.

4. ELv2 License

The only ELv2-licensed framework in the batch (and in all 11 seeds). ELv2 prohibits using the toolkit as the basis for a hosted commercial SaaS product. This is an intentional statement: the author built this for development workflows, not to be resold as an AI developer platform.

5. Opinionated Anti-Sycophancy Behavioral Contract

CLAUDE.extra.md explicitly contracts: "If the user is wrong, say 'no, that's wrong' and explain why. Don't soften with 'have you considered' when you mean 'that won't work.'" Most frameworks use soft behavioral suggestions. serpro69's CLAUDE.extra.md reads like a professional engagement contract — no apologies, no softening, challenge wrong assumptions directly.

The Exploration Phase directive ("Always explore on your own to gain complete understanding. Only delegate to exploration agents if the user explicitly requests it") is uniquely motivated by measured economics: it prevents Claude from spawning exploration agents and then re-reading the same files, doubling token consumption.

Secondary Differentiators

  • Catppuccin statusline with dark mode — the most aesthetically deliberate configuration in the batch
  • skill description budget documented in CLAUDE.md — explicit per-skill 1,536-char cap + 1,024-char OpenCode portability guidance (probably the most technically detailed skill-authoring guide in the batch)
  • Shared instruction symlinks (shared-<name>.md → ../_shared/<name>.md) — clever bundling-safe indirection for DRY skill instructions
  • Template sync infrastructure (template-sync.sh) — keeps downstream projects updated without a full fork-rebase cycle
  • Bash(rm:*) in deny permissions — the explicit block on rm as a default deny rule is unusually cautious
  • mkdocs full documentation site at serpro69.github.io/claude-toolbox/ — professional-grade external docs for a 142-star project

What This Framework Gets Right That Others Don't

  1. The pipeline is a loop, not a DAG. Implement calls review, which calls test, which calls document — skills compose recursively within the pipeline. Other frameworks list skills as independent primitives; serpro69's skills reference each other explicitly.

  2. Knowledge accumulates to the project, not the session. Capy's SQLite is committed to the repo. The knowledge base grows with the project. This means a new developer onboarding to a project immediately has access to prior design decisions — not just code history.

  3. Multi-model review is architecturally embedded (not a one-off trick). The code-reviewer agent's ability to route to Gemini is part of the review design, not a workaround.

Limitations

  • ELv2 is a real barrier for teams wanting to build commercial products on top
  • Capy knowledge base management — BM25 without decay means knowledge.db grows unbounded; no pruning strategy
  • 11 skills is a curated minimum — teams with different workflows (e.g., no formal design phase, or async code review) may find the pipeline over-prescribed
  • Go toolchain dependency for cross-provider generation — adds friction for non-Go teams contributing to the project itself
04

Workflow

serpro69/claude-toolbox — Workflow

Core Development Pipeline

/kk:design → /kk:review-design → /kk:implement → /kk:review-code → /kk:test → /kk:document

This is the primary workflow. Each step has checkpoints and handoffs to the next.

Step-by-Step Walkthrough

Step 1: Design (/kk:design)

  1. User invokes /kk:design and describes a feature idea
  2. Claude asks refinement questions one at a time (not a batch questionnaire)
  3. Once sufficiently refined, produces:
    • docs/wip/<feature>/design.md — design document
    • docs/wip/<feature>/implementation.md — implementation plan
    • docs/wip/<feature>/tasks.md — task list with checkboxes, H2 headings per task, bold status/dependencies
  4. Profile detection: profile-resolver agent detects the project language and injects the appropriate profile (go, java, js_ts, kotlin, k8s, python, etc.)
  5. Capy knowledge query: design skill queries knowledge.db for relevant prior decisions

Step 2: Review Design (/kk:review-design <feature>)

  1. Reads docs/wip/<feature>/design.md
  2. design-reviewer agent checks for:
    • Gaps in assumptions
    • Missing edge cases
    • Interface inconsistencies
    • Security considerations
  3. Findings go to docs/wip/<feature>/design-review.md
  4. Any issues not fixed immediately are documented as TODO/FIXME per the deferred-work protocol

Step 3: Implement (/kk:implement)

  1. Reads docs/wip/<feature>/tasks.md
  2. Executes tasks in order, updating checkbox state
  3. After each batch of tasks, automatically runs:
    • /kk:review-code checkpoint
    • Test verification checkpoint
  4. At end of all tasks: /kk:test, then /kk:document

Step 4: Code Review (/kk:review-code or /kk:review-code:isolated)

Standard mode: Claude reviews its own code for SOLID violations, security risks, quality issues.

Isolated mode (/kk:review-code:isolated):

  1. Spawns independent code-reviewer sub-agent with no access to authoring context
  2. Sub-agent reviews cold — eliminates authorship bias
  3. Optionally routes to external Gemini model for third-party perspective
  4. Findings recorded; issues without immediate fixes get explicit TODO entries

Step 5: Test (/kk:test)

Runs test suites, verifies coverage. Language profile determines test framework and conventions.

Step 6: Document (/kk:document)

Generates or updates documentation. Writes to docs/ per language profile conventions.

Auxiliary Skills

Skill Typical Use
/kk:review-spec Pre-implementation spec review (analogous to review-design but for specs)
/kk:chain-of-verification Multi-step verification for complex claims or changes
/kk:dependency-handling Evaluating dependency upgrades, security vetting
/kk:diff-skill Comparing two skill versions
/kk:merge-docs Combining documentation fragments

Task File Format

## Task 1: <name>

**Status:** todo | in-progress | done
**Dependencies:** Task N

- [ ] Subtask A
- [ ] Subtask B

The /kk:implement skill reads and updates these files live — checkboxes get checked as tasks complete.

Knowledge Persistence (Capy)

Throughout the pipeline, skills write findings, decisions, and conventions to knowledge.db via Capy:

  • Design decisions recorded during /kk:design
  • Review findings recorded during /kk:review-code
  • Conventions from language profiles stored and searchable
  • Future sessions can query prior context via capy_search

This creates a project-scoped memory that persists beyond the session window.

Deferred Work Protocol

Per CLAUDE.extra.md:

  • When a fix is deferred, write it down explicitly (inline TODO: / FIXME: in code, entry in docs/wip/<feature>/tasks.md, or inline note in design doc)
  • "Postponed — trivial" is forbidden; must state: what was deferred, why, and concrete next step
  • Review outputs that identify unfixed issues must record them durably, not as chat asides

Behavioral Contract (CLAUDE.extra.md)

  • Be direct: Challenge wrong assumptions; say "no, that's wrong" not "have you considered"
  • Fail loud: Flag errors explicitly; no silent corrections
  • Pre-existing dead code: Mention but do not delete; only remove orphans your changes created
  • Explore first: Claude explores on its own before spawning exploration agents (prevents double token consumption)
06

Memory Context

serpro69/claude-toolbox — Memory & Context

Memory Architecture

Two-tier hybrid memory:

  1. Capy SQLite knowledge base (knowledge.db) — persistent across sessions, project-scoped, BM25-searchable
  2. docs/wip/<feature>/ — session-durable file-based memory for design docs, task state, review findings

Tier 1: Capy Knowledge Base

Technology

  • SQLite at .capy/knowledge.db
  • BM25 full-text search via Capy MCP server
  • Not vector embeddings — keyword-match search (simpler, deterministic, no embedding API costs)

What Gets Stored

  • Design decisions (from /kk:design executions)
  • Code review findings (from /kk:review-code outputs)
  • Project conventions and patterns
  • Language-profile-specific notes
  • Prior decisions under labels like kk:review-findings, kk:lang-idioms

How Skills Access It

Skills use the capy_search MCP tool to query prior context at invocation time. The design skill queries before starting a new design; review-code queries for established code conventions; etc.

Skills write new findings via capy_index or capy_fetch_and_index (for external URLs).

Persistence

The .capy/knowledge.db file is a regular SQLite database that persists indefinitely. It is:

  • Committed to the project repository (shared across contributors)
  • Available to all sessions working in the project directory
  • Not cleared between sessions

Tier 2: docs/wip/ — Feature-Level Memory

Structure

docs/wip/<feature>/
├── design.md           # Design document (written by /kk:design)
├── implementation.md   # Implementation plan
├── tasks.md           # Task list with checkboxes (updated live by /kk:implement)
└── design-review.md   # Review findings (written by /kk:review-design)

Lifecycle

  • Created by /kk:design at the start of a feature
  • Updated by /kk:implement as tasks complete (checkboxes)
  • Updated by review skills with findings
  • Committed to git — persistent and auditable
  • Deferred work explicitly tracked here (per CLAUDE.extra.md protocol)

CLAUDE.extra.md Behavioral Instructions

Stored in .claude/CLAUDE.extra.md — behavioral contracts that persist across sessions:

  • Independent thinking rules
  • Fail-loud protocol
  • Deferred work documentation requirements
  • Exploration-before-delegation directive

These are project-scoped and don't change between sessions.

Context Window Management

  • model: "claude-opus-4-6[1m]" — 1M context window reduces pressure significantly
  • CLAUDE_CODE_MAX_OUTPUT_TOKENS: "64000" — explicit max output token budget
  • MAX_MCP_OUTPUT_TOKENS: "64000" — limits Capy/MCP output token usage
  • Capy's role is context-window routing: rather than loading all project knowledge into the context window, skills query only what's relevant via BM25 search

Cross-Session Continuity

Mechanism What Persists
knowledge.db Decisions, findings, conventions — indefinitely
docs/wip/<feature>/tasks.md Task progress, checkboxes state
docs/wip/<feature>/design.md Design rationale, assumptions
Git history All of the above via commits

When a session resumes mid-feature, /kk:implement reads tasks.md and continues from where it left off. The checkpoint mechanism (code review after each batch) means partial state is always valid.

Contrast with Other Batch Members

Framework Memory Mechanism
serpro69-claude-toolbox SQLite BM25 (Capy) + file-based docs/wip/
notque-cc-toolkit SQLite with confidence decay + pruning (0.05/30 days)
centminmod-cc-setup Dual-memory: git-shared CLAUDE-.md + machine-local memory/.md
alexeykrol-cc-starter SNAPSHOT.md session state (ephemeral, per-session)
others No persistent memory

Capy's approach differs from notque's: notque implements a learning database with confidence scoring that ages and prunes entries. Capy is a pure retrieval store without decay — knowledge accumulates and must be manually managed.

07

Orchestration

serpro69/claude-toolbox — Orchestration

Orchestration Pattern

Sequential pipeline with isolated sub-agent option. The 11 skills form an ordered pipeline; individual review steps can spawn independent sub-agents for bias-free review.

Primary Orchestration: Skill Pipeline

/kk:design
    → profile-resolver agent (language detection)
    → Capy knowledge query (prior decisions)
    → refinement loop (sequential questions)
    → outputs: docs/wip/<feature>/{design.md, implementation.md, tasks.md}

/kk:review-design
    → design-reviewer agent
    → outputs: docs/wip/<feature>/design-review.md

/kk:implement
    → reads tasks.md
    → execute tasks in batches
    → after each batch: /kk:review-code checkpoint
    → at end: /kk:test + /kk:document

/kk:review-code:isolated
    → spawns code-reviewer sub-agent (no authoring context)
    → optional: route to external Gemini model
    → outputs: review findings

/kk:test → test execution + coverage
/kk:document → documentation generation

Sub-Agent Architecture

Agents as Role-Specific Specialists

Agent Invocation Pattern Why Independent
code-reviewer /kk:review-code:isolated No authorship bias — cold review
design-reviewer /kk:review-design Dedicated design analysis context
spec-reviewer /kk:review-spec Spec-specific evaluation frame
eval-grader Evaluation tasks Scoring with defined rubric
profile-resolver At skill start Language/framework detection

Isolation via Command Variants

The commands/ structure encodes two modes:

klaude-plugin/commands/<name>/
├── default.md     # Standard — invoked as /kk:<name>:default or /kk:<name>
└── isolated.md    # Sub-agent variant — invoked as /kk:<name>:isolated

Default mode: Claude reviews within the current context (full project awareness, potential authorship bias).

Isolated mode: A fresh code-reviewer sub-agent is spawned with only the code under review — no prior conversation, no authorship context. This is the key anti-bias mechanism.

Multi-Model Support

The code-reviewer agent definition references external Gemini models as an optional third-party reviewer. This adds a different model's perspective alongside Claude's own review.

From settings.json perspective: claude-opus-4-6[1m] is the primary model. Gemini appears as an external model in sub-agent invocations via agent definitions, not as a Claude Code model setting.

Concurrency

Skills within a pipeline step run sequentially. The pipeline itself is sequential (design before implement, etc.). The isolated sub-agent invocations are single agents per invocation — no parallel fan-out within the current implementation.

Capy as Orchestration Memory

The Capy MCP server acts as shared memory across pipeline steps:

  • /kk:design writes design decisions to knowledge.db
  • /kk:review-code queries for established conventions before reviewing
  • /kk:implement can query prior implementation patterns

This creates implicit orchestration continuity without requiring a session-spanning orchestrator.

Codex Orchestration (Generated)

The kodex-plugin/ generates equivalent Codex sub-agent definitions from klaude-plugin/agents/*.md:

  • Agent files are converted to .toml format
  • Same skill pipeline available as $kk:<name> in Codex
  • .codex/hooks.json provides Codex-side hook equivalent

The CI check (make generate-kodex && git diff --exit-code) ensures the two providers are always in sync.

Approval Gates

The validate-bash.sh PreToolUse hook is the primary gate — it validates Bash commands before execution. No interactive approval prompts between pipeline steps; the pipeline runs to completion once started.

08

Comparison

serpro69/claude-toolbox — Comparison with Seeds

Closest Seed Analogues

superpowers (closest — skill-based behavior)

Dimension superpowers serpro69-claude-toolbox
Distribution Skills-only framework Plugin (skills + agents + commands + hooks + MCP)
Skill count ~30+ 11 (pipeline-focused)
Naming convention /skill-name /kk:<name> (namespaced)
Memory None built-in Capy SQLite + docs/wip/
Multi-provider Claude Code only Claude Code + Codex (generated)
License MIT ELv2 (restrictive)

The /kk: prefix pattern is analogous to superpowers' skill namespace, but serpro69 adds explicit pipeline ordering that superpowers lacks.

ccmemory (memory angle)

Dimension ccmemory serpro69-claude-toolbox
Persistence File-based append SQLite BM25 + file-based
Search Sequential scan / grep BM25 full-text index
Decay None None
Cross-session Yes Yes
Scope Global Project-scoped

ccmemory focuses purely on memory management. Capy is MCP-mediated and BM25-indexed — fundamentally different retrieval mechanism.

spec-driver (structured development flow)

Dimension spec-driver serpro69-claude-toolbox
Pipeline structure Spec → Tasks → Impl Design → Review → Impl → Review → Test → Doc
Spec format Structured Markdown spec design.md + implementation.md + tasks.md
Task tracking Task list Checkbox tasks.md
Multi-language No Yes (8 profiles)
Memory None Capy knowledge.db

Both implement a structured code-development pipeline. serpro69 adds language profiles, persistent knowledge, and multi-model review that spec-driver lacks.

Unique Capabilities vs All Seeds

1. Cross-Provider Code Generation (No seed has this)

The Go tool cmd/generate-kodex/ transforms Claude Code plugin content → Codex format. Seeds either target Claude Code or Codex, never both with a synchronized generated output. The /kk:$kk: rewrite plus TOML agent conversion is a unique mechanical bridge.

2. Language-Specific Profiles (Not seen in seeds)

8 distinct language profiles (go, java, js_ts, kotlin, k8s, k8s-operator, python) with dedicated review and implementation instructions. Seeds either ignore language differences or include generic guidance. Profile-resolver agent enables dynamic injection.

3. ELv2 License (Unique across all 10 batch + all seeds)

No seed or batch peer uses ELv2. This creates a real restriction: the toolkit cannot be offered as a commercial SaaS product by third parties.

4. /kk:review-code:isolated (New isolation pattern)

Command variants encoding standard vs. isolated sub-agent review in the default.md / isolated.md file naming scheme is a novel convention. Seeds have sub-agents but not this explicit bias-isolation variant pattern.

Batch Peers Comparison

vs. notque-cc-toolkit (most complex batch member)

Dimension notque serpro69
Skills 50+ 11
Agents 44 5
Hooks 77 Python 1 (validate-bash.sh)
Memory SQLite with confidence decay + pruning SQLite BM25 (no decay)
Pipeline Hierarchical orchestrator Sequential pipeline
MCP None Capy (local)
License MIT ELv2

notque is larger and more automated; serpro69 is more opinionated and curated.

vs. centminmod-cc-setup (memory angle)

centminmod's dual-memory uses CLAUDE-.md files (git-shared) + machine-local memory/.md. serpro69 uses SQLite (BM25-searchable, shared via repo commit). Different tradeoffs: CLAUDE.md memory is human-readable; SQLite is machine-queryable.

vs. alex-feel-cc-toolbox (installation paradigm)

alex-feel is a meta-installer (brings nothing, installs anything). serpro69 ships concrete content (11 skills, 5 agents) but also has a sync infrastructure. They represent opposite ends: total declarative flexibility vs. curated opinionated content.

Philosophy

serpro69 is explicitly "minimal by design" (README description prefix) and "battle-tested daily." The anti-AI-slop stance ("Purely AI-made skills are hot garbage") reflects a curation philosophy that none of the seeds articulate this bluntly. The toolkit is the author's personal production toolchain made distributable — not a demonstration framework.

Related frameworks

same archetype · same primary tool · same memory type

CodeMachine CLI ★ 2.5k

JavaScript-DSL workflow orchestration engine that captures repeatable AI coding agent workflows with tracks, condition groups,…

Codexia ★ 690

Tauri desktop app providing visual control plane, task scheduler, git worktree manager, and headless REST API for Codex CLI +…

Kagan ★ 88

Kanban TUI for AI coding agents with a structurally enforced human review gate (REVIEW → DONE cannot be automated) — one git…

oh-my-claudecode (Yeachan-Heo) ★ 35k

Zero-learning-curve teams-first multi-agent orchestration for Claude Code with autopilot (6-phase lifecycle), ralph (PRD-driven…

Paseo ★ 6.8k

Multi-provider AI coding agent orchestration daemon with cross-device access (phone/desktop/CLI) and git worktree isolation.

CCG Workflow ★ 5.4k

Routes Claude + Codex + Gemini to task-appropriate collaboration strategies (direct-fix through full-collaborate) with hook-based…