Skip to content
/

claude-mem (thedotmack)

claude-mem · thedotmack/claude-mem · ★ 78k · last commit 2026-05-26

Primitive shape 25 total
Skills 15 Hooks 6 MCP tools 4
00

Summary

claude-mem — Summary

claude-mem is a production-grade persistent memory compression system for Claude Code (and Gemini CLI, OpenCode, Codex) with 78,000+ GitHub stars and an active npm/marketplace distribution. It captures every tool use (Read, Edit, Bash) as an "observation," compresses observations into semantic summaries via AI, and injects relevant context from past sessions into new ones — enabling Claude to remember codebase decisions, past errors, and prior work without re-explanation. It runs a background worker service on port 37777 with a local web viewer UI for browsing memory, and ships 15 named skills covering search, codebase learning, project planning, weekly digests, and more. The framework targets "progressive disclosure" — layered memory retrieval controlled by token cost — and explicitly supports multiple AI tools (Claude Code, Gemini CLI, OpenCode, Codex, Warp) as well as the OpenClaw gateway for teams. Compared to seeds, it is closest to claude-flow in being an MCP-anchored infrastructure package with a background service, but focuses exclusively on memory persistence rather than full orchestration.

01

Overview

claude-mem — Overview

Origin

Created by thedotmack. First published in 2025. Grew rapidly to 78K stars, becoming one of the most starred memory plugins in the Claude Code ecosystem. Currently at v6.5.0 with active development including beta "Endless Mode" for very long-running sessions.

Philosophy

"Progressive Disclosure" — memory retrieval is layered to control token costs. The system uses a 3-layer workflow:

  1. search — compact index with IDs (~50-100 tokens/result)
  2. timeline — chronological context around interesting results
  3. get_observations — full details ONLY for filtered IDs (~500-1,000 tokens/result)

This prevents dumping all memory into every session, which would be expensive and noisy.

Core Insight (from how-it-works skill)

Every Read, Edit, and Bash that Claude makes turns into a compressed observation.
Observations get summarized at session end. Relevant ones get auto-injected into
future prompts so the next session starts with context from the last one — no
re-explaining the codebase, no re-discovering decisions.

Privacy Model

"Nothing leaves your machine except calls to whichever AI provider you configured for compression." All state lives in ~/.claude-mem/.

Multi-Tool Strategy

Explicit adapters for:

  • Claude Code (primary)
  • Gemini CLI (npx claude-mem install --ide gemini-cli)
  • OpenCode (npx claude-mem install --ide opencode)
  • Codex (separate codex-plugin directory)
  • Warp terminal (WARP.md)
  • OpenClaw gateway (team use)
  • Cursor (cursor-hooks/)

Beta Features

  • Endless Mode — experimental feature for sessions that exceed normal context limits
  • Version switching via npx claude-mem switch <version>
02

Architecture

claude-mem — Architecture

Distribution

npm package (claude-mem). Install via npx claude-mem install (not npm install -g).

Install

npx claude-mem install                        # Claude Code
npx claude-mem install --ide gemini-cli       # Gemini CLI
npx claude-mem install --ide opencode         # OpenCode
# Or via Claude Code plugin marketplace:
/plugin marketplace add thedotmack/claude-mem
/plugin install claude-mem
# Or for OpenClaw:
curl -fsSL https://install.cmem.ai/openclaw.sh | bash

Important: npm install -g claude-mem installs SDK/library only — does NOT register hooks or start worker service.

Directory Structure

plugin/
├── .claude-plugin/         # Claude Code plugin manifest
├── .codex-plugin/          # Codex plugin manifest
├── hooks/
│   ├── hooks.json          # 6 lifecycle hook entries
│   └── codex-hooks.json    # Codex-specific hooks
├── modes/                  # Agent modes (e.g., learn mode)
├── scripts/                # Hook scripts (bun-runner.js, worker-service.cjs)
├── skills/                 # 15 named skills
│   ├── babysit/
│   ├── design-is/
│   ├── do/
│   ├── how-it-works/
│   ├── knowledge-agent/
│   ├── learn-codebase/
│   ├── make-plan/
│   ├── mem-search/
│   ├── oh-my-issues/
│   ├── pathfinder/
│   ├── smart-explore/
│   ├── timeline-report/
│   ├── version-bump/
│   ├── weekly-digests/
│   └── wowerpoint/
└── ui/
    ├── viewer.html         # Local web viewer
    └── viewer-bundle.js    # Bundled React/UI

State Storage

~/.claude-mem/
├── (SQLite DB — observations, sessions, summaries)
├── (Vector index — Chroma)
└── (logs, settings)

Required Runtime

  • Node.js ≥18
  • Bun (for worker service runtime)

Worker Service

Background HTTP service on port 37777. Manages:

  • Observation storage
  • AI compression
  • Memory search API
  • Web viewer endpoint

Target AI Tools

Claude Code, Gemini CLI, OpenCode, Codex, Cursor, Warp, OpenClaw

03

Components

claude-mem — Components

Lifecycle Hooks (6 hook entries)

Hook Event Action
Setup Version check + dependency installer
SessionStart (matcher: startup/clear/compact) Start worker service + inject session context
UserPromptSubmit Session initialization / first-message setup
PostToolUse (matcher: *) Record observation per tool call
PreToolUse (matcher: Read) File-context lookup before reading
Stop Summarize session + compress observations

MCP Tools (4)

Exposed via .mcp.json:

Tool Purpose
search Get compact index with IDs (~50-100 tokens/result)
timeline Get chronological context around an anchor observation
get_observations Fetch full details for specific IDs
(4th tool — unknown) unknown

Skills (15)

Skill Purpose
babysit Monitor ongoing work, surface issues
design-is Capture design decisions
do Task execution helper
how-it-works Explain claude-mem to the user
knowledge-agent Knowledge base query agent
learn-codebase Front-load entire repo into memory (~5 min)
make-plan Create structured project plans
mem-search Natural language search of past sessions
oh-my-issues Issue tracking helper
pathfinder Codebase navigation
smart-explore Intelligent codebase exploration
timeline-report Generate timeline of work done
version-bump Version management helper
weekly-digests Weekly work summary generation
wowerpoint (unknown — presentation?)

Web Viewer

Local UI served by worker at http://localhost:37777. Features:

  • Real-time memory stream
  • Observation browsing
  • Observation detail via http://localhost:37777/api/observation/{id}
  • Citation references for past observations

Worker Service Architecture

Node.js/Bun worker service (plugin/scripts/worker-service.cjs) manages all heavy lifting. Claude hooks call bun-runner.js which delegates to the worker. The worker exposes HTTP API on port 37777.

05

Prompts

claude-mem — Prompts

mem-search skill (verbatim, key sections)

---
name: mem-search
description: Search claude-mem's persistent cross-session memory database. Use when
user asks "did we already solve this?", "how did we do X last time?", or needs work
from previous sessions.
---

# Memory Search

## 3-Layer Workflow (ALWAYS Follow)

**NEVER fetch full details without filtering first. 10x token savings.**

### Step 1: Search - Get Index with IDs

search(query="authentication", limit=20, project="my-project")

Returns: Table with IDs, timestamps, types, titles (~50-100 tokens/result)

| ID | Time | T | Title | Read |
|----|------|---|-------|------|
| #11131 | 3:48 PM | 🟣 | Added JWT authentication | ~75 |
| #10942 | 2:15 PM | 🔴 | Fixed auth token expiration | ~50 |

### Step 2: Timeline - Get Context Around Interesting Results

timeline(anchor=11131, depth_before=3, depth_after=3, project="my-project")

### Step 3: Fetch - Get Full Details ONLY for Filtered IDs

Review titles from Step 1 and context from Step 2. Pick relevant IDs.

Prompting technique: Token-budget-aware search protocol. Forces progressive disclosure through explicit 3-step constraint ("NEVER fetch full details without filtering first"). The "10x token savings" framing creates economic motivation for the constraint.

how-it-works skill (verbatim excerpt)

## What it does

Every Read, Edit, and Bash that Claude makes turns into a compressed observation.
Observations get summarized at session end. Relevant ones get auto-injected into
future prompts so the next session starts with context from the last one — no
re-explaining the codebase, no re-discovering decisions.

## When it kicks in

Memory injection starts on your second session in a project.

Prompting technique: Transparent mechanism explanation. The skill explains what the system does rather than instructing behavior, supporting user trust and adoption.

09

Uniqueness

claude-mem — Uniqueness

Differs From Seeds

Most similar to claude-flow (seed) in architecture — both use MCP tools plus a background service plus lifecycle hooks. The delta: claude-mem is exclusively a memory/context system, while claude-flow is a full orchestration framework. claude-mem's worker service architecture (background daemon on port 37777 + HTTP API) is unique in the memory-system category; all other memory systems in this batch use synchronous hook scripts. Also similar to ccmemory (seed) in purpose but uses SQLite+Chroma vs Neo4j+Ollama, and adds the progressive disclosure protocol to manage token costs.

Unique Aspects

  1. Background daemon architecture: The only memory system in this batch that runs a persistent background service (port 37777). This enables async AI compression without blocking agent turns.
  2. Progressive disclosure protocol: The explicit 3-layer search (search → timeline → get_observations) with "10x token savings" framing is a novel UX pattern for memory retrieval.
  3. Cross-platform first: Explicit adapters for 7+ AI tools (Claude Code, Gemini CLI, OpenCode, Codex, Cursor, Warp, OpenClaw) — widest portability in this batch.
  4. Private content tagging: <private> tags for selective exclusion is unique.
  5. Scale indicator: 78K GitHub stars — 3x the next largest in this batch. Community adoption signal.

Observable Failure Modes

  1. Port conflicts: Port 37777 must be free; conflicts with other services will prevent the worker from starting.
  2. Bun dependency: Worker service requires Bun runtime, adding a dependency most memory systems avoid.
  3. AI compression costs: Every session end triggers AI API calls for compression; heavy use accumulates cost.
  4. Service startup latency: Worker must start before hooks can function; slow machines may see hook timeouts.
  5. Global memory pollution: Cross-project global memory can inject irrelevant context from unrelated projects.
04

Workflow

claude-mem — Workflow

Session Lifecycle

SessionStart → worker service starts → inject relevant past context
                                        (from second session onward)
                ↓
UserPromptSubmit → session init / first message processing
                ↓
PostToolUse (every tool) → record observation
PreToolUse (Read) → file context lookup
                ↓
Stop → compress observations → AI-generated session summary → store

Memory Injection Logic

Context injection starts on the second session in a project. First session seeds memory. Subsequent sessions receive auto-injected context for relevant past work.

3-Layer Search Workflow (from mem-search skill)

Layer Tool Tokens Purpose
1 search ~50-100/result Get compact index with IDs
2 timeline moderate Get chronological context around anchor
3 get_observations ~500-1,000/result Full details for filtered IDs only

Rule: Never fetch full details without filtering first. 10x token savings.

Phases + Artifacts

Phase Artifact
Tool use Observation row in SQLite
Session end AI-compressed summary in SQLite
Memory search Timeline + observation details
Codebase learn Full repo observations (via /learn-codebase skill)

Approval Gates

None automated. The babysit skill provides human-in-the-loop monitoring.

Private Content

Add <private> tags to exclude sensitive content from storage.

Opt-Out

  • Per-session: include SKIP_MEMORY in prompt
  • Global: configure exclusion patterns
06

Memory Context

claude-mem — Memory & Context

Memory Type

hybrid — SQLite (primary) + Chroma vector database (semantic search).

Persistence Scope

global — All data stored in ~/.claude-mem/ globally. Cross-project by default.

State Files

Path Content
~/.claude-mem/*.db SQLite: observations, sessions, summaries
~/.claude-mem/chroma/ Chroma vector index for semantic search

Context Compaction Strategy

Explicit and designed in:

  • PostToolUse → each tool call = one observation row
  • Stop hook → AI compresses all observations from the session into a summary
  • SessionStart → injects compressed summaries from past sessions, not raw observations

This is fundamentally different from file-based approaches: compaction is the normal operation, not an edge case.

Progressive Disclosure Protocol

Memory injection uses 3 layers to control token costs:

  1. Compact index (IDs + titles): ~50-100 tokens per result
  2. Timeline view (context around anchor): moderate tokens
  3. Full details (for specific IDs): ~500-1,000 tokens per result

Cross-Session Handoffs

Yes — automatically at every session boundary via SessionStart/Stop hooks.

Private Content

Use <private> tags in conversation to exclude sensitive content from storage.

Context Injection Filtering

The SessionStart hook injects relevant context — not all memory, just what the worker's semantic search deems relevant to the current session/project. This prevents context pollution.

Worker Service Role

The background worker on port 37777 handles the heavy lifting: AI compression calls, vector indexing, semantic search. Claude hooks only make HTTP calls to the worker, keeping hook latency minimal.

07

Orchestration

claude-mem — Orchestration

Multi-Agent

No explicit multi-agent orchestration. OpenClaw integration enables team-shared memory but is not agent-to-agent coordination.

Orchestration Pattern

none — Pure memory infrastructure. Skills provide task assistance but don't spawn subagents.

Isolation

none — In-place global memory. No per-feature isolation.

Multi-Model

Yes (implicit). The worker service calls a configurable AI provider for compression (Claude by default, but also OpenRouter, Gemini). This is not role-based multi-model routing but provider flexibility.

Execution Mode

background-daemon — The worker service runs persistently on port 37777. Claude hooks communicate with it via HTTP. This is the distinctive architecture: it's the only memory system in this batch that runs a persistent background service.

Crash Recovery

The worker service can be restarted via SessionStart hook. SQLite provides transactional safety.

Notes

The background daemon architecture is the key differentiator: unlike hook-only systems (ccmemory, ccmemory-plain) that run scripts synchronously per event, claude-mem's worker service can perform long-running tasks (AI compression, vector indexing) asynchronously without blocking the agent.

08

Ui Cli Surface

claude-mem — UI & CLI Surface

Dedicated CLI Binary

npx claude-mem — the installer/manager CLI. Subcommands:

  • install — sets up plugin, hooks, worker service
  • install --ide gemini-cli — Gemini CLI mode
  • install --ide opencode — OpenCode mode
  • uninstall — clean removal
  • switch <version> — version switching (beta channel)

Note: npm install -g claude-mem gives the SDK only, NOT the installer.

Local Web Dashboard

Yes — served by worker on http://localhost:37777

Feature Details
Type Web dashboard (browser-based)
Port 37777
Tech HTML/JS (viewer.html + viewer-bundle.js bundled in plugin/ui/)
Features Real-time memory stream, observation browsing, observation detail view
Citation API http://localhost:37777/api/observation/{id}

IDE Integration

Plugins for:

  • Claude Code (.claude-plugin/)
  • Codex (.codex-plugin/)
  • Cursor (cursor-hooks/)
  • Warp (WARP.md)
  • Gemini CLI (auto-detected via ~/.gemini)
  • OpenClaw gateway (dedicated installer script)

Observability

  • Web viewer at port 37777 for real-time memory
  • All observations browsable with citation IDs
  • Session summaries accessible
  • how-it-works skill provides user-facing explanation

Notes on Installation

The installation complexity is multi-step by design: the worker service requires Bun runtime + port availability. The npm install approach is intentional to avoid npm install -g which would skip the service setup.

Related frameworks

same archetype · same primary tool · same memory type

pi (badlogic/earendil) ★ 55k

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.

Qodo (PR-Agent) ★ 11k

Open-source AI PR reviewer with single-call tool architecture, PR compression for large diffs, self-reflection quality gate, and…