Skip to content
/

RLM (Infinite Memory)

rlm-claude · EncrEor/rlm-claude · ★ 35 · last commit 2026-03-19

Primitive shape 27 total
Commands 2 Skills 2 Subagents 4 Hooks 5 MCP tools 14
00

Summary

RLM (Infinite Memory) — Summary

RLM is a Python MCP server that gives Claude Code persistent memory across context compaction events, implementing the architecture described in the MIT CSAIL "Recursive Language Models" paper (arXiv:2512.24601). It ships 14 MCP tools organized into three memory subsystems: Insights (key–value facts), Chunks (full conversation segments), and Retention (lifecycle management with 3-zone archiving). A PreCompact hook intercepts every /compact event and injects a blocking message requiring Claude to snapshot the session before context is wiped. Optional hybrid BM25 + cosine semantic search (Model2Vec or FastEmbed) enables relevance-ranked retrieval across all stored data. Two Claude Code skills (/rlm-analyze, /rlm-parallel) implement the MIT paper's Map-Reduce pattern: /rlm-parallel fans out three parallel Task-tool subagents over separate chunks then merges results with a fourth "Merger" subagent.

Compared to seeds: closest to ccmemory (MCP-anchored memory store), but differs architecturally — RLM stores data in flat JSON + gzip files under ~/.claude/rlm/ rather than a Neo4j graph, uses BM25+vector hybrid search rather than graph traversal, and treats session continuity (PreCompact hook) as the primary design axis rather than knowledge-graph querying. Unlike ccmemory's global graph, RLM supports multi-project domain organization with cross-project search filtering.

01

Overview

RLM — Origin & Philosophy

Origin

Built by Ahmed MAKNI (@EncrEor) with Claude Opus as joint R&D partner. Explicitly inspired by:

  • MIT CSAIL RLM paper (arXiv:2512.24601, Dec 2025) — "Recursive Language Models" — source of the chunk/peek/grep primitives and the Map-Reduce sub-agent pattern
  • MAGMA paper (arXiv:2601.03236, Jan 2026) — temporal filtering and entity extraction
  • Letta/MemGPT — early inspiration for persistent AI agent memory

Released on PyPI as mcp-rlm-server. Active development through 9 phases, with Phase 10 planned.

Philosophy

"Your Claude Code sessions forget everything after /compact. RLM fixes that."

The core insight is that Claude Code's context compaction is a data loss event, not just a UX inconvenience. RLM treats the PreCompact hook as a mandatory save gate: before any compaction, the session must be chunked. This is user-driven (you decide what to chunk) but infrastructure-enforced (you can't compact without being prompted).

A second philosophical axis: user sovereignty over memory. The README explicitly positions RLM against cloud-dependent memory systems. All data lives under ~/.claude/rlm/ in human-readable JSON + gzip. No API keys, no cloud accounts.

Phased Development Roadmap

The project documents 9 completed phases and 1 planned:

  1. Memory tools (remember/recall/forget/status)
  2. Navigation tools (chunk/peek/grep/list)
  3. Auto-chunking + sub-agent skills
  4. Production (auto-summary, dedup, access tracking)
  5. Advanced (BM25 search, fuzzy grep, multi-sessions, retention)
  6. Production-ready (tests, CI/CD, PyPI)
  7. MAGMA-inspired (temporal filtering, entity extraction)
  8. Hybrid semantic search (BM25 + cosine, Model2Vec)
  9. Typed chunking — chunk_type parameter
  10. (Planned) Auto-memory/RLM cohabitation — Write/Edit hook redirects

Quotes

"3 lines to install. 14 tools. Zero configuration."

"User-driven philosophy: you decide when to chunk, the system saves before loss."

02

Architecture

RLM — Architecture

Distribution

  • Type: MCP server (Python package) + Claude Code skills + hooks
  • PyPI package: mcp-rlm-server
  • Version analyzed: latest (PyPI) / main branch

Install Methods

# Recommended
pip install mcp-rlm-server[all]

# Fast (no global pollution)
uv tool install mcp-rlm-server[all] --python 3.12

# Full install with hooks + skills
git clone https://github.com/EncrEor/rlm-claude.git && ./install.sh

# Docker
docker build -t rlm-server . && claude mcp add rlm-server -- docker run -i --rm -v ~/.claude/rlm/context:/data rlm-server

Required Runtime

  • Python 3.10+
  • Claude Code CLI
  • Optional: mcp-rlm-server[semantic] for Model2Vec embeddings (~35 MB)
  • Optional: mcp-rlm-server[semantic-fastembed] for FastEmbed (~230 MB)

Directory Tree (post-install)

~/.claude/rlm/
├── context/
│   ├── chunks/           # Active conversation chunks (JSON)
│   ├── archive/          # Archived chunks (gzip)
│   ├── index.json        # Chunk metadata index
│   ├── session_memory.json   # Current session insights
│   └── domains.json      # Project/domain configuration
└── hooks/
    ├── pre_compact_chunk.py
    ├── reset_chunk_counter.py
    ├── memory_write_redirect.py
    └── i18n.py

Repository Structure

rlm-claude/
├── src/mcp_server/       # MCP server source (Python)
│   └── tools/
│       └── fileutil.py   # Atomic I/O primitives
├── hooks/                # Claude Code hook scripts
├── templates/
│   ├── skills/rlm-analyze/skill.md
│   ├── skills/rlm-parallel/skill.md
│   ├── hooks_settings.json
│   └── CLAUDE_RLM_SNIPPET.md
├── context/              # Default data directory
│   ├── chunks/
│   └── domains.json.example
├── scripts/
│   ├── benchmark_providers.py
│   └── backfill_embeddings.py
├── pyproject.toml
├── Dockerfile
└── install.sh / uninstall.sh

Target AI Tools

  • Claude Code (primary)
  • Any MCP-compatible client (Docker deployment)

Data Storage

  • ~/.claude/rlm/context/ — all persistent data (JSON + gzip)
  • Configurable via RLM_CONTEXT_DIR environment variable
  • SHA-256 content deduplication
  • File locking via fcntl.flock for concurrent access safety
  • Atomic writes (temp-then-rename)
  • Chunk size limit: 2 MB; archive decompression cap: 10 MB
03

Components

RLM — Components

MCP Tools (14 total)

Memory & Insights

Tool Purpose
rlm_remember Save decisions, facts, preferences with categories and importance levels
rlm_recall Search insights by keyword (multi-word tokenized), category, or importance
rlm_forget Remove an insight by ID
rlm_status System overview: insight count, chunk stats, access metrics

Conversation History (Chunks)

Tool Purpose
rlm_chunk Save conversation segments with typed categorization (snapshot, session, debug; insight redirects to rlm_remember)
rlm_peek Read a chunk (full or partial by line range)
rlm_grep Regex search across all chunks (+ fuzzy matching for typo tolerance)
rlm_search Hybrid search: BM25 + semantic cosine similarity (FR/EN, accent-normalized, chunks + insights)
rlm_list_chunks List all chunks with metadata

Multi-Project Organization

Tool Purpose
rlm_sessions Browse sessions by project or domain
rlm_domains List available domains for categorization

Smart Retention (3-zone lifecycle)

Tool Purpose
rlm_retention_preview Preview what would be archived (dry-run)
rlm_retention_run Archive old unused chunks, purge ancient ones
rlm_restore Bring back archived chunks

Retention zones: Active → Archive (.gz) → Purge
Immunity system: Critical tags, frequent access, and keywords protect chunks from archiving.

Claude Code Skills (2)

Skill Purpose
/rlm-analyze Analyze a single chunk with an isolated subagent (Explore type, read-only)
/rlm-parallel Analyze multiple chunks in parallel — 3 subagents fan out, 1 Merger subagent synthesizes (Map-Reduce from MIT RLM paper)

Claude Code Hooks (3 hooks across 2 events)

Event Matcher Script Purpose
PreCompact manual pre_compact_chunk.py Block compaction, inject chunk-required message
PreCompact auto pre_compact_chunk.py Same for auto-compact
PostToolUse mcp__rlm-server__rlm_chunk reset_chunk_counter.py Track stats after chunk operations
PostToolUse Write memory_write_redirect.py Detect writes to Claude auto-memory, nudge to use RLM
PostToolUse Edit memory_write_redirect.py Same for Edit operations

Scripts (3)

Script Purpose
scripts/benchmark_providers.py Compare Model2Vec vs FastEmbed embedding quality
scripts/backfill_embeddings.py Embed existing chunks after adding semantic search
scripts/backfill_entities.py Extract entities from existing chunks (MAGMA Phase 7)

Templates

Template Purpose
templates/CLAUDE_RLM_SNIPPET.md CLAUDE.md insert fragment for Claude Code instructions
templates/hooks_settings.json Ready-to-use hooks configuration
templates/skills/rlm-analyze/skill.md Skill definition for single-chunk analysis
templates/skills/rlm-parallel/skill.md Skill definition for parallel Map-Reduce analysis
05

Prompts

RLM — Prompt Files (Verbatim Excerpts)

Excerpt 1: /rlm-parallel Skill — Map Phase Prompt

Source: templates/skills/rlm-parallel/skill.md

Prompting technique: Few-shot parallel dispatch with explicit step decomposition + Map-Reduce pattern from MIT RLM paper arXiv:2512.24601. The skill specifies that exactly 3 Task tools must be launched in a single message to achieve true parallelism.

### Etape 3 : Analyse parallele (CRITIQUE)

Lancer **exactement 3 Task tools dans un seul message** pour execution parallele :

Task #1 : subagent_type="Explore", model="sonnet" → Analyse chunk 1 avec la question

Task #2 : subagent_type="Explore", model="sonnet" → Analyse chunk 2 avec la question

Task #3 : subagent_type="Explore", model="sonnet" → Analyse chunk 3 avec la question


**Prompt pour chaque sub-agent** :

---

Tu es un assistant d'analyse. Reponds a la question basee UNIQUEMENT sur ce chunk.

### Question
{question}

### Chunk {chunk_id}
{contenu du chunk}

### Instructions
- Extrais les informations pertinentes a la question
- Cite les passages cles entre guillemets si utile
- Si rien de pertinent, reponds "Pas d'information pertinente dans ce chunk"
- Sois concis (max 200 mots)

---

Excerpt 2: /rlm-parallel Skill — Merge Phase Prompt

Source: templates/skills/rlm-parallel/skill.md

Prompting technique: Hierarchical reduction (Merger subagent synthesizes 3 partial answers) with explicit contradiction detection and source attribution. Implements the "Reduce" step of Map-Reduce.

### Etape 5 : Fusion (Merger)

Lancer un Task final pour synthetiser :

Task Merger : subagent_type="Explore", model="sonnet"


**Prompt Merger** :

---

Tu es un synthetiseur. Combine ces analyses partielles en une reponse coherente.

### Question originale
{question}

### Analyses partielles
**Chunk {chunk_id_1}** :
{reponse_1}

**Chunk {chunk_id_2}** :
{reponse_2}

**Chunk {chunk_id_3}** :
{reponse_3}

### Instructions
- Synthetise les insights sans repetition
- Si des analyses se contredisent, signale-le : "Contradiction : ..."
- Cite les sources : [chunk_id] pour chaque fait
- Structure la reponse avec des bullet points si utile
- Max 400 mots

---

Excerpt 3: PreCompact Hook — Blocking System Message

Source: hooks/pre_compact_chunk.py

Prompting technique: Hook-injected systemMessage that creates a mandatory gate. The message is i18n-aware (French by default, English optional) and includes live context window percentage.

message = (
    f"[🔄 {t('compact_title')}]{ctx_info}\n"
    f"━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n\n"
    f"{t('compact_body')}"
)
result = {"systemMessage": message}
print(json.dumps(result))

The compact_body translation key expands to instructions requiring the user to run rlm_chunk before compaction proceeds, along with the current context percentage displayed as (ctx: N%).

Excerpt 4: /rlm-analyze Skill — Single-Chunk Analysis Prompt

Source: templates/skills/rlm-analyze/skill.md

Prompting technique: Constrained-context subagent pattern — subagent is given only the chunk content and forbidden from making assumptions beyond it.

**Etape 3** : Lance un subagent Task avec le contexte suivant :

---

Tu es un assistant d'analyse. Reponds a la question basee UNIQUEMENT sur le contexte fourni.

### Question
{La question de l'utilisateur}

### Contexte (Chunk {chunk_id})
{Contenu du chunk charge via rlm_peek}

### Instructions
- Reponds de maniere concise et precise
- Cite des passages pertinents si utile
- Si l'information n'est pas dans le contexte, dis-le clairement
- Ne fais pas de suppositions au-dela du contexte

---
09

Uniqueness

RLM — Uniqueness & Positioning

differs_from_seeds

RLM is closest to ccmemory in the seed set — both are MCP-anchored memory frameworks for Claude Code that persist facts across sessions. The key architectural delta: RLM uses flat JSON files + gzip archives under ~/.claude/rlm/ rather than ccmemory's Neo4j graph, enabling zero-infra deployment (no Docker, no database server). RLM adds a research-backed hybrid BM25+cosine search layer, a 3-zone retention lifecycle with immunity scoring, and the MIT RLM paper's Map-Reduce pattern as first-class skills. Unlike ccmemory's graph-traversal model, RLM thinks in "sessions and chunks" rather than "entities and relationships." RLM also differs from superpowers (skills-only behavioral framework) by being infrastructure — it doesn't change how Claude codes, it prevents context loss.

Positioning

RLM occupies a niche between ccmemory (graph memory) and simple CLAUDE.md file injection (agent-os). It is the only framework in this batch that:

  1. Cites and implements a specific CS research paper (MIT arXiv:2512.24601) as its architecture
  2. Ships both keyword and vector search with two provider choices and performance benchmarks
  3. Has a 3-zone lifecycle (active/archive/purge) with an immunity system

Observable Failure Modes

  • Gate evasion: The PreCompact hook injects a message but cannot force Claude to call rlm_chunk — a determined agent or user could ignore it. The philosophy is user-driven; the system nudges but does not enforce.
  • Language mismatch: Skills are written in French (the project's default language). English users must parse French instructions or use the RLM_LANG=fr/en env var.
  • Semantic quality gap: The default Model2Vec provider overlaps only ~1.6/5 results with FastEmbed. Users relying on semantic search may get worse results than expected unless they switch providers.
  • No graph relationships: Chunks are discrete text blobs. For reasoning over connected facts (e.g., "how does decision A relate to person B?"), ccmemory's Neo4j graph is superior.
  • Manual chunking burden: The framework doesn't auto-chunk sessions; it only reminds users to do it. High-frequency users may find the prompting repetitive.

Cross-References

  • Implements: MIT CSAIL RLM paper (arXiv:2512.24601)
  • Inspired by: MAGMA (arXiv:2601.03236), Letta/MemGPT
  • Competes with: ccmemory (same use-case, different storage backend)
04

Workflow

RLM — Workflow

Session Lifecycle

Phase 1: Installation

  • pip install mcp-rlm-server[all] or ./install.sh
  • Register MCP server: claude mcp add rlm-server -- python3 -m mcp_server
  • Copy hooks and skills to ~/.claude/
  • Configure ~/.claude/settings.json with hook entries from templates/hooks_settings.json

Artifact: Running MCP server, configured hooks

Phase 2: During Session — Active Memory

  • User calls rlm_remember to persist facts (decisions, preferences, architecture notes)
  • User calls rlm_chunk to save conversation segments with type labels
  • rlm_search or rlm_recall to retrieve prior context
  • rlm_status to see memory health metrics

Artifact: ~/.claude/rlm/context/session_memory.json, chunk files in chunks/

Phase 3: Pre-Compaction Gate (automatic)

  • Claude triggers /compact or auto-compact fires
  • PreCompact hook intercepts, reads context window percentage
  • Injects blocking systemMessage requiring Claude to chunk before proceeding
  • Claude must call rlm_chunk (type=snapshot) to save current state
  • Only then does compaction proceed

Artifact: Snapshot chunk file, reset chunk counter

Phase 4: Post-Session Analysis (on demand)

  • /rlm-analyze <chunk_id> "<question>" — isolated Explore subagent answers question from chunk
  • /rlm-parallel "<question>" — 3 parallel Explore subagents, 1 Merger — Map-Reduce over top-3 relevant chunks

Artifact: Synthesized answer with source citations

Phase 5: Retention Management (periodic)

  • rlm_retention_preview — dry-run to see what would be archived
  • rlm_retention_run — move old chunks to .gz archive, purge ancient ones
  • rlm_restore <chunk_id> — bring back archived chunks

Artifact: Compressed archive in ~/.claude/rlm/context/archive/

Approval Gates

Gate Type Description
PreCompact injection automatic Blocking message forces chunk before context wipe
Manual chunk decision user User decides whether/what to chunk; no forced automatic chunking

Artifacts Per Phase

Phase Artifact
Install ~/.claude/settings.json hook entries
Active memory ~/.claude/rlm/context/session_memory.json + chunk JSON files
Pre-compact Snapshot chunk + stats update
Analysis Synthesized Markdown response
Retention .gz archives in archive/ directory
06

Memory Context

RLM — Memory & Context

Memory Architecture

RLM implements two distinct memory subsystems:

Subsystem 1: Insights (Key-value facts)

  • Storage: ~/.claude/rlm/context/session_memory.json
  • Format: JSON with categories, importance levels, timestamps
  • Search: Multi-word tokenized keyword search via rlm_recall
  • Persistence: Global (cross-session, cross-project with domain filtering)

Subsystem 2: Chunks (Conversation history)

  • Storage: ~/.claude/rlm/context/chunks/ (JSON files, one per chunk)
  • Format: Plain text with typed metadata (chunk_type: snapshot/session/debug)
  • Search: Regex + fuzzy via rlm_grep; hybrid BM25+cosine via rlm_search
  • Persistence: Global with per-project domain organization

Retention / Lifecycle

Three-zone lifecycle:

  1. Activechunks/ directory, full JSON
  2. Archivearchive/ directory, .gz compressed
  3. Purge — permanent deletion

Immunity system: Chunks tagged as critical, frequently accessed, or matching protected keywords are immune from archiving.

Semantic Search (Optional)

Two embedding providers:

Provider Model Dimensions Embed 108 chunks Memory
Model2Vec (default) potion-multilingual-128M 256 0.06s 0.1 MB
FastEmbed paraphrase-multilingual-MiniLM-L12-v2 384 1.30s 0.3 MB

Hybrid BM25+cosine fusion: keyword matching + vector similarity. Graceful degradation to pure BM25 when semantic deps absent. Auto-embedding at chunk creation time.

Cross-Session Handoff

Primary mechanism: PreCompact hook forces a snapshot chunk before any compaction event.

After compaction, Claude can:

  1. Call rlm_list_chunks to see recent history
  2. Call rlm_peek to reload specific context
  3. Call rlm_search to find relevant prior decisions

Multi-project support: Auto-detection of project from git or working directory. Domain tags enable cross-project filtering on all search tools.

State Files

File Contents
~/.claude/rlm/context/index.json Chunk metadata index
~/.claude/rlm/context/session_memory.json Active insights
~/.claude/rlm/context/chunks/*.json Individual chunks
~/.claude/rlm/context/archive/*.json.gz Archived chunks
~/.claude/rlm/context/domains.json Project/domain config

Context Compaction Handling

Yes — PreCompact hooks fire for both manual and auto compaction events. The hook reads context window percentage from stdin JSON and includes it in the blocking message (ctx: N%).

07

Orchestration

RLM — Orchestration

Multi-Agent

Yes — via the /rlm-parallel skill which spawns 4 subagents per invocation:

  • 3 parallel Task-tool subagents (subagent_type="Explore") for chunk analysis (Map phase)
  • 1 Task-tool Merger subagent for synthesis (Reduce phase)

Pattern: parallel-fan-out followed by hierarchical reduction. Directly implements the MIT RLM paper's Map-Reduce design.

Subagent Definition Format

task-tool-spawn — subagents are not defined as files; they are spawned inline by the /rlm-analyze and /rlm-parallel skill prompts using Claude Code's Task tool with subagent_type="Explore".

Orchestration Pattern

parallel-fan-out (for /rlm-parallel)
none (for all other operations — direct MCP tool calls)

Isolation Mechanism

Subagents are Explore-type (read-only, isolated context window of 200k tokens). No git worktree or container isolation.

Multi-Model

No — all subagents use Sonnet. No model routing or role-based model assignment.

Execution Mode

event-driven — the PreCompact hooks fire on Claude Code lifecycle events. MCP tools are invoked on-demand within sessions.

Crash Recovery

No — no documented crash recovery mechanism.

Streaming Output

No — MCP tools return complete responses.

Max Concurrent Agents

3 (the /rlm-parallel fan-out is hard-capped at 3 chunks to avoid token explosion).

Consensus Mechanism

None.

08

Ui Cli Surface

RLM — UI & CLI Surface

Dedicated CLI Binary

No dedicated CLI binary. Install and management are via pip install + standard claude mcp add commands. The install.sh / uninstall.sh scripts handle setup automation.

Local Web Dashboard

None. All interaction is through Claude Code's chat interface via MCP tool calls and slash commands.

Slash Commands (Claude Code)

Command Description
/rlm-analyze <chunk_id> "<question>" Analyze a single chunk with isolated Explore subagent
/rlm-parallel "<question>" [chunk_ids...] Parallel Map-Reduce analysis over 3 chunks

IDE Integration

Claude Code only (via MCP + hooks). No Cursor, Copilot, or other editor support documented.

Observability

  • rlm_status tool returns system overview (insight count, chunk stats, access metrics)
  • Access tracking on chunks (immunity system uses access frequency)
  • PostToolUse hook on rlm_chunk maintains chunk counter stats
  • No log file or audit trail beyond the chunks themselves

Uninstall Surface

Interactive uninstall script with options:

  • ./uninstall.sh — interactive (choose keep/delete data)
  • ./uninstall.sh --keep-data — remove config, keep chunks
  • ./uninstall.sh --all — remove everything
  • ./uninstall.sh --dry-run — preview what would be removed

Docker Support

Docker image available for isolation or remote deployment:

docker build -t rlm-server .
claude mcp add rlm-server -- docker run -i --rm -v ~/.claude/rlm/context:/data rlm-server

Related frameworks

same archetype · same primary tool · same memory type

Taskmaster AI ★ 27k

Converts a PRD into a dependency-ordered JSON task graph that AI coding agents execute one task at a time, eliminating context…

ccmemory ★ 1

Accumulates decisions, corrections, and failed approaches from Claude Code sessions into a queryable Neo4j graph so each new…

Pimzino spec-workflow-mcp ★ 4.2k

MCP server providing spec-driven development workflow with dashboard-backed approval gates, implementation logging, and VSCode…

MCP Shrimp Task Manager ★ 2.1k

Convert natural language requests into structured AI development tasks with chain-of-thought enforcement, reflection gates, and…

Bernstein ★ 460

Govern parallel CLI coding agents with a deterministic Python scheduler, HMAC-chained audit trail, and compliance-ready signed…

LeanSpec ★ 252

Provides a unified spec CLI and MCP server over any existing spec backend (markdown, GitHub Issues, ADO), making spec-driven…