Skip to content
/

claude-self-reflect

claude-self-reflect · ramakay/claude-self-reflect · ★ 214 · last commit 2026-05-25

Primitive shape 18 total
Hooks 6 MCP tools 12
00

Summary

Claude Self-Reflect — Summary

Claude Self-Reflect (CSR) is a single 44MB Rust binary that gives Claude Code persistent conversation memory via a 3-layer enrichment pipeline: raw conversation chunks → structured metadata → AI-generated narratives, achieving a claimed 9.3x search quality improvement. It ships 6 lifecycle hooks (SessionStart, UserPromptSubmit, PostToolUse, Stop, PreCompact, SessionEnd), 12 MCP tools with safety annotations, local FastEmbed embeddings (384-dim), HNSW sub-millisecond vector search, and AST-aware code analysis across 6 languages — all running from a single binary with no Docker, no database server, and no API keys required.

Version 8.0 represents a complete rewrite of the previous Python/Docker/Qdrant stack into pure Rust. The install path is a one-line curl script or npm install -g claude-self-reflect; the csr-engine setup command imports past conversations, registers the MCP server, and installs hooks. Key metric claims: context retention below 20% after 10 sessions without CSR, sub-1ms p95 search, 93ms startup, ~20 conversations/sec import speed.

Compared to seeds, CSR is the technical inverse of ccmemory: both use hooks to capture and re-inject conversation memory in Claude Code, but ccmemory relies on an external Neo4j + Docker stack, while CSR is a zero-dependency single binary. CSR differs from basic-memory (explicit agent writes) by being fully automatic (hooks capture without agent action). The 3-layer enrichment pipeline that progressively transforms chunks into AI narratives is unique in this batch.

01

Overview

Claude Self-Reflect — Overview

Origin

Rama Annaswamy (procsolve.io). MIT license. v8.0.5 ("Complete Rust Rewrite"). Previous stack was Python + Docker + Qdrant; entire stack replaced with a single Rust binary in v8.0. Available on npm as claude-self-reflect and as a Claude Code plugin.

Philosophy

From the README:

"Claude forgets everything. This fixes that. Single 44MB binary. No databases. No containers. No API keys required."

"Context retention drops below 20% after 10 sessions. CSR fixes this with a single binary that gives Claude perfect memory."

The design philosophy is:

  1. Zero friction install — one curl command, runs everywhere
  2. Fully automatic — no commands, no syntax; install once and past context appears automatically
  3. Progressive enrichment — three layers improve quality from raw chunks (baseline) to AI narratives (9.3x improvement), with each layer optional
  4. Local-first privacy — all embeddings and storage run locally; get_session_learnings specifically targets Ralph loop iteration memory

Architecture Statement (from CLAUDE.md)

csr-engine (44MB)
  ├── MCP server (rmcp, 12 tools)
  ├── Embeddings (FastEmbed, 384-dim, local)
  ├── Search (HNSW, <1ms p95)
  ├── Storage (SQLite)
  ├── AST analysis (ast-grep, 6 languages)
  ├── 6 Claude Code hooks
  └── 3-layer enrichment pipeline

The 3-Layer Pipeline

  1. Raw chunks — conversation text, split and stored as vectors
  2. Structured metadata — session grouping, file edit tracking, deduplication
  3. AI narratives (optional daemon) — csr-engine daemon generates human-readable narratives that 9.3x improve search recall
02

Architecture

Claude Self-Reflect — Architecture

Distribution

  • npm package: claude-self-reflect (installs csr-engine binary)
  • Direct install: curl -fsSL https://raw.githubusercontent.com/ramakay/claude-self-reflect/main/scripts/install.sh | sh
  • Cargo build from source (macOS Intel only — no prebuilt binary)
  • Claude Code plugin: .claude-plugin/plugin.json with "install": {"command": "npx claude-self-reflect setup"}

Install Methods

curl -fsSL .../install.sh | sh      # downloads prebuilt binary for platform
npm install -g claude-self-reflect  # via npm
cargo build --release               # build from source (csr-engine/)
csr-engine setup                    # post-install: import, register MCP, install hooks

Platform Support

Platform Support
macOS Apple Silicon Prebuilt binary
Linux x86_64 / WSL Prebuilt binary
Linux ARM64 Prebuilt binary
macOS Intel Build from source

Binary Size and Performance

  • Binary: 44MB
  • Startup: 93ms (cached)
  • Search p95: <1ms
  • Import speed: ~20 conversations/sec
  • Embedding: 0.73ms/text (batch)

Directory Structure

csr-engine/          # Rust source
├── src/
│   ├── main.rs      # CLI + MCP server entry
│   ├── hooks/       # 6 hook implementations
│   ├── search/      # HNSW index + FastEmbed
│   ├── storage/     # SQLite access
│   └── pipeline/    # 3-layer enrichment
scripts/
└── install.sh       # Platform detection + download
.claude-plugin/
└── plugin.json      # Claude plugin manifest

State Files

  • ~/.local/share/csr/ — SQLite database, HNSW index, narrative cache
  • Hook scripts registered in ~/.claude/settings.json
  • MCP server registered in Claude Code's MCP config

Required Runtime

None (single statically-linked binary). Optional: Anthropic API key for AI narrative generation via csr-engine daemon.

Target AI Tools

Primary: Claude Code. The hooks are Claude Code-specific lifecycle events. The MCP tools could theoretically work with any MCP client.

03

Components

Claude Self-Reflect — Components

Hooks (6)

Hook Event What It Does
SessionStart Surfaces relevant past context at conversation start
UserPromptSubmit Predicts and injects context before Claude responds
PostToolUse Tracks file edits with session-scoped deduplication
Stop Stores iteration learnings, detects stuck patterns
PreCompact Backs up state before context compaction
SessionEnd Stores session narrative for future retrieval

All hooks use catch-all error handling — they never block Claude Code.

MCP Tools (12)

Tool Safety Description
csr_reflect_on_past read-only Semantic search across past conversations
store_reflection writes Store insights for future retrieval
csr_quick_check read-only Fast existence check (count + top match)
search_by_recency read-only Time-constrained search ("last week")
get_recent_work read-only "What did we work on?" with session grouping
get_timeline read-only Activity timeline with statistics
csr_search_by_file read-only Find conversations that touched a file
csr_search_by_concept read-only Theme-based search ("security", "testing")
csr_search_insights read-only Aggregated patterns from search results
csr_get_more read-only Paginate through additional results
get_full_conversation read-only Retrieve complete JSONL conversation
get_session_learnings read-only Iteration-level memory for Ralph loops

All tools include MCP tool annotations (safety characteristics per spec 2025-11-05).

CLI Commands (7, from CLAUDE.md)

Command Purpose
csr-engine Start MCP server (default)
csr-engine setup Import + register MCP + install hooks
csr-engine status System status (JSON)
csr-engine status --compact Statusline output
csr-engine daemon Background enrichment (AI narratives)
csr-engine hook install --apply Install/update hooks
csr-engine eval Quick eval (5 tests, ~7ms)
csr-engine eval --full Full eval (20 tests, ~200ms)
csr-engine quality <file> AST code quality analysis

Scripts (1)

  • scripts/install.sh — platform detection, binary download, MCP registration, hook installation

Subagents, Slash Commands, Templates

None.

05

Prompts

Claude Self-Reflect — Prompts

Prompt File 1: CLAUDE.md — Action Guide

Technique: Prescriptive action guide injected at session start (through the plugin / CLAUDE.md convention). Provides Claude with a tool reference card.

Key excerpt:

# Claude Self-Reflect v8.0 — Action Guide

## Critical Rules

1. **PATH RULE**: Always use `/Users/username/...` never `~/...` in MCP commands
2. **TEST RULE**: Never claim success without running `cargo test`
3. **RESTART RULE**: After modifying MCP server code, restart Claude Code
4. **QUALITY GATE**: When pre-commit hook blocks, fix the issue — never use `--no-verify`

## MCP Tools (12 total)

csr_reflect_on_past   — Semantic search across past conversations
store_reflection      — Store insights for future retrieval
csr_quick_check       — Fast existence check
search_by_recency     — Time-constrained search
get_recent_work       — Session-grouped recent activity

Technique used: Rule-list prompt with numbered critical constraints; tool reference card as compact lookup table.

Prompt File 2: Plugin Description (plugin.json)

Technique: One-sentence capability declaration used during plugin installation/discovery.

{
  "description": "Persistent conversation memory with semantic search, session continuity detection, cross-project intelligence, and predictive context injection. Zero-dependency 45MB Rust binary with local embeddings — no API keys, no Docker, sub-millisecond search."
}

Technique used: Feature enumeration packed into a single description string; technical credibility markers (44MB, no Docker, sub-millisecond) set adoption expectations.

SessionStart Hook Output (runtime prompt injection)

Technique: Dynamic context injection — the hook fetches relevant past chunks and prepends them to the session as a synthetic system-level context block. No static prompt file; the content is dynamically assembled from the HNSW search results.

The hook runs csr-engine status and injects top-k relevant conversations as context fragments before Claude's first response. The agent sees this as part of the conversation context, not a separate message.

09

Uniqueness

Claude Self-Reflect — Uniqueness

Differs From Seeds

Closest seed: ccmemory — both are hook-driven memory layers for Claude Code that capture conversation history and re-inject it on session start. Key deltas: (1) ccmemory uses Neo4j graph database + Docker + Ollama for a rich typed-relationship graph; CSR uses a single Rust binary with SQLite + HNSW, eliminating all external dependencies. (2) ccmemory's LLM-based detection stores typed nodes (Decision, Correction, FailedApproach); CSR stores raw conversation chunks and enriches them offline via a 3-layer pipeline. (3) ccmemory has 4 hooks; CSR has 6 (adds UserPromptSubmit for predictive injection and PreCompact for backup). (4) CSR adds AST-aware search across 6 languages — not in ccmemory. Against basic-memory: basic-memory requires explicit agent writes; CSR is fully automatic. Against taskmaster-ai: CSR is a memory layer, not a task decomposition system.

Positioning

"The zero-dependency memory layer for Claude Code developers who want zero setup friction." The v8.0 Rust rewrite is a direct response to the ccmemory/qdrant pattern: users complained about Docker overhead, so CSR collapsed the entire stack into one binary.

Observable Failure Modes

  1. No structured typing — raw conversation chunks may retrieve irrelevant past conversations for unrelated topics; structured ccmemory nodes (Decision vs Correction) have better retrieval precision for specific knowledge types.
  2. AI narratives require API key — the claimed 9.3x improvement requires running csr-engine daemon with an Anthropic API key; without it users get Layer 1 only.
  3. Global index — no project isolation; a 10-project developer will have noisy cross-project retrieval without careful file-path filtering.
  4. 9.3x claim unverified — the improvement number has no external benchmark citation; it appears to be an internal measurement.
  5. macOS Intel unsupported via prebuilt — requires Cargo build from source.
04

Workflow

Claude Self-Reflect — Workflow

Phases

Phase What Happens Artifact
Install curl install.sh → downloads binary + csr-engine setup Binary at ~/.local/bin/csr-engine; hooks in Claude Code; MCP registered
Import csr-engine setup imports past conversations from Claude Code's JSONL conversation history SQLite + HNSW index populated
Active Session Hooks fire automatically on every prompt, tool use, stop, compaction Context injected silently; learnings stored
Enrichment (optional) csr-engine daemon runs in background generating AI narratives Layer-3 enriched narrative index
Future Session SessionStart hook surfaces relevant past context Relevant chunks injected before first response

Hooks as Workflow Gates

The 6 hooks cover the entire session lifecycle:

  • Before work: SessionStart (context injection)
  • During work: UserPromptSubmit (predictive injection), PostToolUse (file tracking)
  • After work: Stop (iteration learning), PreCompact (state backup), SessionEnd (narrative storage)

Approval Gates

None — fully automatic, zero user interaction required.

Spec Format

None — CSR stores conversation history, not structured specs.

Spec Storage

none — state storage is conversation chunks + embeddings + narratives in SQLite/HNSW.

Delta vs Whole File

Not applicable.

06

Memory Context

Claude Self-Reflect — Memory & Context

Memory Model

sqlite + vector-db (hybrid) — SQLite stores conversation chunks, metadata, and enrichment state; an in-process HNSW index provides sub-millisecond vector search. All components are embedded in the single Rust binary.

Persistence Scope

global — all projects share the same index, searched by file path and project context. Cross-project intelligence is explicit: csr_search_by_file finds conversations that touched a specific file across all sessions.

Storage Details

Component Technology Purpose
Chunk store SQLite Conversation JSONL storage, metadata
Vector index HNSW (in-process) <1ms semantic search
Embeddings FastEmbed 384-dim Local, no API key
AST index ast-grep (6 languages) Code-aware search

The 3-Layer Enrichment Pipeline

  1. Layer 1 — Raw chunks: Conversation text split at message boundaries and stored as 384-dim embeddings. This is the baseline.
  2. Layer 2 — Structured metadata: Session grouping (which files were edited per session), deduplication, file-level tracking via PostToolUse hook.
  3. Layer 3 — AI narratives (optional): csr-engine daemon runs Anthropic API calls to generate human-readable session narratives. README claims 9.3x improvement in search quality over Layer 1 alone.

Context Compaction Handling

yes — explicit PreCompact hook backs up all state before Claude Code triggers a context compaction event, preventing data loss during long sessions.

Cross-Session Handoffs

yes — the primary value proposition:

  • SessionStart hook fires, searches HNSW for recent/relevant conversations, injects top results as context
  • UserPromptSubmit hook predicts relevant context before each response (predictive injection)
  • get_session_learnings tool provides iteration memory for Ralph-style loops
  • get_recent_work returns session-grouped activity across all past sessions

Context Injection Strategy

Two injection points per session:

  1. Pull at start (SessionStart): historical context for the entire session
  2. Predictive per-prompt (UserPromptSubmit): just-in-time injection tuned to the current query

State Files

  • ~/.local/share/csr/csr.db — SQLite database (chunks, metadata, enrichment state)
  • ~/.local/share/csr/hnsw/ — HNSW index files
  • ~/.local/share/csr/narratives/ — Layer-3 AI narrative cache

Search Mechanism

vector — HNSW index with FastEmbed 384-dim local embeddings; secondary: AST-aware search for code queries; csr_search_by_file uses filename metadata; search_by_recency uses timestamp filters.

Claim: 9.3x Improvement

From README: "Three layers progressively improve search quality from raw chunks to AI-enriched narratives — 9.3x improvement." Benchmark method not described; claimed as internal measurement.

07

Orchestration

Claude Self-Reflect — Orchestration

Multi-Agent

No native multi-agent orchestration. All 6 hooks target a single Claude Code session.

Orchestration Pattern

none — single-agent memory layer; no spawn, no delegation.

Isolation Mechanism

none — data is global (all projects in one SQLite + HNSW). Project isolation is implicit via file-path filtering in search queries.

Execution Mode

event-driven — all activity is hook-driven (6 Claude Code lifecycle events trigger the csr-engine binary synchronously). csr-engine daemon adds a background enrichment process (optional).

Multi-Model

No — single-model by design. Optional AI narrative generation uses whatever Anthropic model is configured via API key.

Context Compaction Handling

yes — explicit PreCompact hook.

Crash Recovery

Partial — PreCompact saves state before compaction; crash mid-session loses the current session's learnings until SessionEnd fires.

Auto Validators

None — CSR is a memory/context layer, not a code quality gate.

Prompt Chaining

No formal chaining. The enrichment pipeline (chunks → metadata → narratives) is an offline post-processing pipeline, not a real-time prompt chain.

Streaming

no — MCP tools return synchronous responses.

08

Ui Cli Surface

Claude Self-Reflect — UI & CLI Surface

CLI Binary

  • Name: csr-engine
  • Type: Own runtime (Rust binary), not a thin wrapper
  • Subcommands: setup, status, status --compact, daemon, hook install --apply, eval, eval --full, quality <file>

Local UI

None. No web dashboard, TUI, or desktop app.

Observability

  • csr-engine status — JSON output with system health, index stats, hook status
  • csr-engine status --compact — single-line statusline (for shell prompts)
  • csr-engine eval — 5-test quick eval (~7ms)
  • csr-engine eval --full — 20-test full eval (~200ms)
  • get_timeline MCP tool — activity timeline with statistics

IDE Integration

Via Claude Code plugin mechanism (.claude-plugin/plugin.json). No VS Code extension, no Cursor integration beyond MCP.

Transport

stdio (MCP protocol to Claude Code). No HTTP mode.

Performance Numbers (from README)

Metric Value
Cached startup 93ms
Search latency (p95) <1ms
Binary size 44MB
Import speed ~20 conversations/sec
Embedding 0.73ms/text (batch)

Related frameworks

same archetype · same primary tool · same memory type

MemPalace ★ 53k

Verbatim local-first AI memory with 96.6% R@5 retrieval on LongMemEval using zero API calls — structured into a palace hierarchy…

Beads (Yegge) ★ 24k

Dolt-powered distributed graph issue tracker where AI agents track tasks with hierarchical IDs and dependency edges, claim work…

deepagents (LangChain) ★ 23k

Opinionated Python agent harness on top of LangGraph with sub-agents, filesystem, memory, and context compaction bundled in

agentmemory ★ 18k

Persistent, searchable memory for AI coding agents that captures every tool interaction, compresses it via LLM, and injects…

Open Multi-Agent ★ 6.3k

Give a natural-language goal to a coordinator agent and get a dynamically decomposed, parallelized task DAG executed by…

Basic Memory ★ 3.1k

Gives AI agents a persistent, human-readable knowledge graph of project decisions, observations, and relations stored as plain…