claude-self-reflect

claude-self-reflect · ramakay/claude-self-reflect · ★ 214 · last commit 2026-05-25

Primitive shape 18 total

Hooks 6 MCP tools 12

Summary

Claude Self-Reflect — Summary

Claude Self-Reflect (CSR) is a single 44MB Rust binary that gives Claude Code persistent conversation memory via a 3-layer enrichment pipeline: raw conversation chunks → structured metadata → AI-generated narratives, achieving a claimed 9.3x search quality improvement. It ships 6 lifecycle hooks (SessionStart, UserPromptSubmit, PostToolUse, Stop, PreCompact, SessionEnd), 12 MCP tools with safety annotations, local FastEmbed embeddings (384-dim), HNSW sub-millisecond vector search, and AST-aware code analysis across 6 languages — all running from a single binary with no Docker, no database server, and no API keys required.

Version 8.0 represents a complete rewrite of the previous Python/Docker/Qdrant stack into pure Rust. The install path is a one-line curl script or npm install -g claude-self-reflect; the csr-engine setup command imports past conversations, registers the MCP server, and installs hooks. Key metric claims: context retention below 20% after 10 sessions without CSR, sub-1ms p95 search, 93ms startup, ~20 conversations/sec import speed.

Compared to seeds, CSR is the technical inverse of ccmemory: both use hooks to capture and re-inject conversation memory in Claude Code, but ccmemory relies on an external Neo4j + Docker stack, while CSR is a zero-dependency single binary. CSR differs from basic-memory (explicit agent writes) by being fully automatic (hooks capture without agent action). The 3-layer enrichment pipeline that progressively transforms chunks into AI narratives is unique in this batch.

Overview

Claude Self-Reflect — Overview

Origin

Rama Annaswamy (procsolve.io). MIT license. v8.0.5 ("Complete Rust Rewrite"). Previous stack was Python + Docker + Qdrant; entire stack replaced with a single Rust binary in v8.0. Available on npm as claude-self-reflect and as a Claude Code plugin.

Philosophy

From the README:

"Claude forgets everything. This fixes that. Single 44MB binary. No databases. No containers. No API keys required."

"Context retention drops below 20% after 10 sessions. CSR fixes this with a single binary that gives Claude perfect memory."

The design philosophy is:

Zero friction install — one curl command, runs everywhere
Fully automatic — no commands, no syntax; install once and past context appears automatically
Progressive enrichment — three layers improve quality from raw chunks (baseline) to AI narratives (9.3x improvement), with each layer optional
Local-first privacy — all embeddings and storage run locally; get_session_learnings specifically targets Ralph loop iteration memory

Architecture Statement (from CLAUDE.md)

csr-engine (44MB)
  ├── MCP server (rmcp, 12 tools)
  ├── Embeddings (FastEmbed, 384-dim, local)
  ├── Search (HNSW, <1ms p95)
  ├── Storage (SQLite)
  ├── AST analysis (ast-grep, 6 languages)
  ├── 6 Claude Code hooks
  └── 3-layer enrichment pipeline

The 3-Layer Pipeline

Raw chunks — conversation text, split and stored as vectors
Structured metadata — session grouping, file edit tracking, deduplication
AI narratives (optional daemon) — csr-engine daemon generates human-readable narratives that 9.3x improve search recall

Architecture

Claude Self-Reflect — Architecture

Distribution

npm package: claude-self-reflect (installs csr-engine binary)
Direct install: curl -fsSL https://raw.githubusercontent.com/ramakay/claude-self-reflect/main/scripts/install.sh | sh
Cargo build from source (macOS Intel only — no prebuilt binary)
Claude Code plugin: .claude-plugin/plugin.json with "install": {"command": "npx claude-self-reflect setup"}

Install Methods

curl -fsSL .../install.sh | sh      # downloads prebuilt binary for platform
npm install -g claude-self-reflect  # via npm
cargo build --release               # build from source (csr-engine/)
csr-engine setup                    # post-install: import, register MCP, install hooks

Platform Support

Platform	Support
macOS Apple Silicon	Prebuilt binary
Linux x86_64 / WSL	Prebuilt binary
Linux ARM64	Prebuilt binary
macOS Intel	Build from source

Binary Size and Performance

Binary: 44MB
Startup: 93ms (cached)
Search p95: <1ms
Import speed: ~20 conversations/sec
Embedding: 0.73ms/text (batch)

Directory Structure

csr-engine/          # Rust source
├── src/
│   ├── main.rs      # CLI + MCP server entry
│   ├── hooks/       # 6 hook implementations
│   ├── search/      # HNSW index + FastEmbed
│   ├── storage/     # SQLite access
│   └── pipeline/    # 3-layer enrichment
scripts/
└── install.sh       # Platform detection + download
.claude-plugin/
└── plugin.json      # Claude plugin manifest

State Files

~/.local/share/csr/ — SQLite database, HNSW index, narrative cache
Hook scripts registered in ~/.claude/settings.json
MCP server registered in Claude Code's MCP config

Required Runtime

None (single statically-linked binary). Optional: Anthropic API key for AI narrative generation via csr-engine daemon.

Target AI Tools

Primary: Claude Code. The hooks are Claude Code-specific lifecycle events. The MCP tools could theoretically work with any MCP client.

Components

Claude Self-Reflect — Components

Hooks (6)

Hook Event	What It Does
`SessionStart`	Surfaces relevant past context at conversation start
`UserPromptSubmit`	Predicts and injects context before Claude responds
`PostToolUse`	Tracks file edits with session-scoped deduplication
`Stop`	Stores iteration learnings, detects stuck patterns
`PreCompact`	Backs up state before context compaction
`SessionEnd`	Stores session narrative for future retrieval

All hooks use catch-all error handling — they never block Claude Code.

MCP Tools (12)

Tool	Safety	Description
`csr_reflect_on_past`	read-only	Semantic search across past conversations
`store_reflection`	writes	Store insights for future retrieval
`csr_quick_check`	read-only	Fast existence check (count + top match)
`search_by_recency`	read-only	Time-constrained search ("last week")
`get_recent_work`	read-only	"What did we work on?" with session grouping
`get_timeline`	read-only	Activity timeline with statistics
`csr_search_by_file`	read-only	Find conversations that touched a file
`csr_search_by_concept`	read-only	Theme-based search ("security", "testing")
`csr_search_insights`	read-only	Aggregated patterns from search results
`csr_get_more`	read-only	Paginate through additional results
`get_full_conversation`	read-only	Retrieve complete JSONL conversation
`get_session_learnings`	read-only	Iteration-level memory for Ralph loops

All tools include MCP tool annotations (safety characteristics per spec 2025-11-05).

CLI Commands (7, from CLAUDE.md)

Command	Purpose
`csr-engine`	Start MCP server (default)
`csr-engine setup`	Import + register MCP + install hooks
`csr-engine status`	System status (JSON)
`csr-engine status --compact`	Statusline output
`csr-engine daemon`	Background enrichment (AI narratives)
`csr-engine hook install --apply`	Install/update hooks
`csr-engine eval`	Quick eval (5 tests, ~7ms)
`csr-engine eval --full`	Full eval (20 tests, ~200ms)
`csr-engine quality <file>`	AST code quality analysis

Scripts (1)

scripts/install.sh — platform detection, binary download, MCP registration, hook installation

Subagents, Slash Commands, Templates

None.

Prompts

Claude Self-Reflect — Prompts

Prompt File 1: CLAUDE.md — Action Guide

Technique: Prescriptive action guide injected at session start (through the plugin / CLAUDE.md convention). Provides Claude with a tool reference card.

Key excerpt:

# Claude Self-Reflect v8.0 — Action Guide

## Critical Rules

1. **PATH RULE**: Always use `/Users/username/...` never `~/...` in MCP commands
2. **TEST RULE**: Never claim success without running `cargo test`
3. **RESTART RULE**: After modifying MCP server code, restart Claude Code
4. **QUALITY GATE**: When pre-commit hook blocks, fix the issue — never use `--no-verify`

## MCP Tools (12 total)

csr_reflect_on_past   — Semantic search across past conversations
store_reflection      — Store insights for future retrieval
csr_quick_check       — Fast existence check
search_by_recency     — Time-constrained search
get_recent_work       — Session-grouped recent activity

Technique used: Rule-list prompt with numbered critical constraints; tool reference card as compact lookup table.

Prompt File 2: Plugin Description (plugin.json)

Technique: One-sentence capability declaration used during plugin installation/discovery.

{
  "description": "Persistent conversation memory with semantic search, session continuity detection, cross-project intelligence, and predictive context injection. Zero-dependency 45MB Rust binary with local embeddings — no API keys, no Docker, sub-millisecond search."
}

Technique used: Feature enumeration packed into a single description string; technical credibility markers (44MB, no Docker, sub-millisecond) set adoption expectations.

SessionStart Hook Output (runtime prompt injection)

Technique: Dynamic context injection — the hook fetches relevant past chunks and prepends them to the session as a synthetic system-level context block. No static prompt file; the content is dynamically assembled from the HNSW search results.

The hook runs csr-engine status and injects top-k relevant conversations as context fragments before Claude's first response. The agent sees this as part of the conversation context, not a separate message.

Uniqueness

Claude Self-Reflect — Uniqueness

Differs From Seeds

Closest seed: ccmemory — both are hook-driven memory layers for Claude Code that capture conversation history and re-inject it on session start. Key deltas: (1) ccmemory uses Neo4j graph database + Docker + Ollama for a rich typed-relationship graph; CSR uses a single Rust binary with SQLite + HNSW, eliminating all external dependencies. (2) ccmemory's LLM-based detection stores typed nodes (Decision, Correction, FailedApproach); CSR stores raw conversation chunks and enriches them offline via a 3-layer pipeline. (3) ccmemory has 4 hooks; CSR has 6 (adds UserPromptSubmit for predictive injection and PreCompact for backup). (4) CSR adds AST-aware search across 6 languages — not in ccmemory. Against basic-memory: basic-memory requires explicit agent writes; CSR is fully automatic. Against taskmaster-ai: CSR is a memory layer, not a task decomposition system.

Positioning

"The zero-dependency memory layer for Claude Code developers who want zero setup friction." The v8.0 Rust rewrite is a direct response to the ccmemory/qdrant pattern: users complained about Docker overhead, so CSR collapsed the entire stack into one binary.

Observable Failure Modes

No structured typing — raw conversation chunks may retrieve irrelevant past conversations for unrelated topics; structured ccmemory nodes (Decision vs Correction) have better retrieval precision for specific knowledge types.
AI narratives require API key — the claimed 9.3x improvement requires running csr-engine daemon with an Anthropic API key; without it users get Layer 1 only.
Global index — no project isolation; a 10-project developer will have noisy cross-project retrieval without careful file-path filtering.
9.3x claim unverified — the improvement number has no external benchmark citation; it appears to be an internal measurement.
macOS Intel unsupported via prebuilt — requires Cargo build from source.

Workflow

Claude Self-Reflect — Workflow

Phases

Phase	What Happens	Artifact
Install	`curl install.sh` → downloads binary + `csr-engine setup`	Binary at `~/.local/bin/csr-engine`; hooks in Claude Code; MCP registered
Import	`csr-engine setup` imports past conversations from Claude Code's JSONL conversation history	SQLite + HNSW index populated
Active Session	Hooks fire automatically on every prompt, tool use, stop, compaction	Context injected silently; learnings stored
Enrichment (optional)	`csr-engine daemon` runs in background generating AI narratives	Layer-3 enriched narrative index
Future Session	SessionStart hook surfaces relevant past context	Relevant chunks injected before first response

Hooks as Workflow Gates

The 6 hooks cover the entire session lifecycle:

Before work: SessionStart (context injection)
During work: UserPromptSubmit (predictive injection), PostToolUse (file tracking)
After work: Stop (iteration learning), PreCompact (state backup), SessionEnd (narrative storage)

Approval Gates

None — fully automatic, zero user interaction required.

Spec Format

None — CSR stores conversation history, not structured specs.

Spec Storage

none — state storage is conversation chunks + embeddings + narratives in SQLite/HNSW.

Delta vs Whole File

Not applicable.

Memory Context

Claude Self-Reflect — Memory & Context

Memory Model

sqlite + vector-db (hybrid) — SQLite stores conversation chunks, metadata, and enrichment state; an in-process HNSW index provides sub-millisecond vector search. All components are embedded in the single Rust binary.

Persistence Scope

global — all projects share the same index, searched by file path and project context. Cross-project intelligence is explicit: csr_search_by_file finds conversations that touched a specific file across all sessions.

Storage Details

Component	Technology	Purpose
Chunk store	SQLite	Conversation JSONL storage, metadata
Vector index	HNSW (in-process)	<1ms semantic search
Embeddings	FastEmbed 384-dim	Local, no API key
AST index	ast-grep (6 languages)	Code-aware search

The 3-Layer Enrichment Pipeline

Layer 1 — Raw chunks: Conversation text split at message boundaries and stored as 384-dim embeddings. This is the baseline.
Layer 2 — Structured metadata: Session grouping (which files were edited per session), deduplication, file-level tracking via PostToolUse hook.
Layer 3 — AI narratives (optional): csr-engine daemon runs Anthropic API calls to generate human-readable session narratives. README claims 9.3x improvement in search quality over Layer 1 alone.

Context Compaction Handling

yes — explicit PreCompact hook backs up all state before Claude Code triggers a context compaction event, preventing data loss during long sessions.

Cross-Session Handoffs

yes — the primary value proposition:

SessionStart hook fires, searches HNSW for recent/relevant conversations, injects top results as context
UserPromptSubmit hook predicts relevant context before each response (predictive injection)
get_session_learnings tool provides iteration memory for Ralph-style loops
get_recent_work returns session-grouped activity across all past sessions

Context Injection Strategy

Two injection points per session:

Pull at start (SessionStart): historical context for the entire session
Predictive per-prompt (UserPromptSubmit): just-in-time injection tuned to the current query

State Files

~/.local/share/csr/csr.db — SQLite database (chunks, metadata, enrichment state)
~/.local/share/csr/hnsw/ — HNSW index files
~/.local/share/csr/narratives/ — Layer-3 AI narrative cache

Search Mechanism

vector — HNSW index with FastEmbed 384-dim local embeddings; secondary: AST-aware search for code queries; csr_search_by_file uses filename metadata; search_by_recency uses timestamp filters.

Claim: 9.3x Improvement

From README: "Three layers progressively improve search quality from raw chunks to AI-enriched narratives — 9.3x improvement." Benchmark method not described; claimed as internal measurement.

Orchestration

Claude Self-Reflect — Orchestration

Multi-Agent

No native multi-agent orchestration. All 6 hooks target a single Claude Code session.

Orchestration Pattern

none — single-agent memory layer; no spawn, no delegation.

Isolation Mechanism

none — data is global (all projects in one SQLite + HNSW). Project isolation is implicit via file-path filtering in search queries.

Execution Mode

event-driven — all activity is hook-driven (6 Claude Code lifecycle events trigger the csr-engine binary synchronously). csr-engine daemon adds a background enrichment process (optional).

Multi-Model

No — single-model by design. Optional AI narrative generation uses whatever Anthropic model is configured via API key.

Context Compaction Handling

yes — explicit PreCompact hook.

Crash Recovery

Partial — PreCompact saves state before compaction; crash mid-session loses the current session's learnings until SessionEnd fires.

Auto Validators

None — CSR is a memory/context layer, not a code quality gate.

Prompt Chaining

No formal chaining. The enrichment pipeline (chunks → metadata → narratives) is an offline post-processing pipeline, not a real-time prompt chain.

Streaming

no — MCP tools return synchronous responses.

Ui Cli Surface

Claude Self-Reflect — UI & CLI Surface

CLI Binary

Name: csr-engine
Type: Own runtime (Rust binary), not a thin wrapper
Subcommands: setup, status, status --compact, daemon, hook install --apply, eval, eval --full, quality <file>

Local UI

None. No web dashboard, TUI, or desktop app.

Observability

csr-engine status — JSON output with system health, index stats, hook status
csr-engine status --compact — single-line statusline (for shell prompts)
csr-engine eval — 5-test quick eval (~7ms)
csr-engine eval --full — 20-test full eval (~200ms)
get_timeline MCP tool — activity timeline with statistics

IDE Integration

Via Claude Code plugin mechanism (.claude-plugin/plugin.json). No VS Code extension, no Cursor integration beyond MCP.

Transport

stdio (MCP protocol to Claude Code). No HTTP mode.

Performance Numbers (from README)

Metric	Value
Cached startup	93ms
Search latency (p95)	<1ms
Binary size	44MB
Import speed	~20 conversations/sec
Embedding	0.73ms/text (batch)

Related frameworks

same archetype · same primary tool · same memory type

MemPalace ★ 53k

A10 Memory engine

Verbatim local-first AI memory with 96.6% R@5 retrieval on LongMemEval using zero API calls — structured into a palace hierarchy…

Beads (Yegge) ★ 24k

A10 Memory engine

Dolt-powered distributed graph issue tracker where AI agents track tasks with hierarchical IDs and dependency edges, claim work…

deepagents (LangChain) ★ 23k

A10 Memory engine

Opinionated Python agent harness on top of LangGraph with sub-agents, filesystem, memory, and context compaction bundled in

agentmemory ★ 18k

A10 Memory engine

Persistent, searchable memory for AI coding agents that captures every tool interaction, compresses it via LLM, and injects…

Open Multi-Agent ★ 6.3k

A10 Memory engine

Give a natural-language goal to a coordinator agent and get a dynamically decomposed, parallelized task DAG executed by…

Basic Memory ★ 3.1k

A10 Memory engine

Gives AI agents a persistent, human-readable knowledge graph of project decisions, observations, and relations stored as plain…

Distribution

Type: claude-plugin
License: MIT
Install: one-liner
Version: 8.1.0

Surfaces

CLI binary: csr-engine
CLI subcmds: 9
Local UI: No
Tech stack: null

Components

Commands: 0
Skills: 0
Subagents: 0
Hooks: 6
MCP servers: 1
MCP tools: 12
Scripts: 1
Templates: 0

Workflow

Phases: 7
Approval gates: 0
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: No
Pattern: none
Max concurrent: 1
Isolation: none
Consensus: none
Prompt chaining: No

Multi-model

Multi-model: No
BYOK: Yes
Modal: text

Execution

Mode: event-driven
Crash recovery: No
Compaction: Yes
Session handoff: Yes
Streaming: No

Memory

Type: hybrid
Persistence: global
Search: vector
State files: 3 files

Quality

TDD: No
TDD mechanism: none
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: No
Audit format: none
Replay: No

Tools

Primary: claude-code
Targets: 1
Portability: low

Signals

Stars: 214
Last commit: 2026-05-25
Contributors: 6
Maintainer: active
Quality score: 1/10