RLM (Infinite Memory)

rlm-claude · EncrEor/rlm-claude · ★ 35 · last commit 2026-03-19

Primitive shape 27 total

Commands 2 Skills 2 Subagents 4 Hooks 5 MCP tools 14

Summary

RLM (Infinite Memory) — Summary

RLM is a Python MCP server that gives Claude Code persistent memory across context compaction events, implementing the architecture described in the MIT CSAIL "Recursive Language Models" paper (arXiv:2512.24601). It ships 14 MCP tools organized into three memory subsystems: Insights (key–value facts), Chunks (full conversation segments), and Retention (lifecycle management with 3-zone archiving). A PreCompact hook intercepts every /compact event and injects a blocking message requiring Claude to snapshot the session before context is wiped. Optional hybrid BM25 + cosine semantic search (Model2Vec or FastEmbed) enables relevance-ranked retrieval across all stored data. Two Claude Code skills (/rlm-analyze, /rlm-parallel) implement the MIT paper's Map-Reduce pattern: /rlm-parallel fans out three parallel Task-tool subagents over separate chunks then merges results with a fourth "Merger" subagent.

Compared to seeds: closest to ccmemory (MCP-anchored memory store), but differs architecturally — RLM stores data in flat JSON + gzip files under ~/.claude/rlm/ rather than a Neo4j graph, uses BM25+vector hybrid search rather than graph traversal, and treats session continuity (PreCompact hook) as the primary design axis rather than knowledge-graph querying. Unlike ccmemory's global graph, RLM supports multi-project domain organization with cross-project search filtering.

Overview

RLM — Origin & Philosophy

Origin

Built by Ahmed MAKNI (@EncrEor) with Claude Opus as joint R&D partner. Explicitly inspired by:

MIT CSAIL RLM paper (arXiv:2512.24601, Dec 2025) — "Recursive Language Models" — source of the chunk/peek/grep primitives and the Map-Reduce sub-agent pattern
MAGMA paper (arXiv:2601.03236, Jan 2026) — temporal filtering and entity extraction
Letta/MemGPT — early inspiration for persistent AI agent memory

Released on PyPI as mcp-rlm-server. Active development through 9 phases, with Phase 10 planned.

Philosophy

"Your Claude Code sessions forget everything after /compact. RLM fixes that."

The core insight is that Claude Code's context compaction is a data loss event, not just a UX inconvenience. RLM treats the PreCompact hook as a mandatory save gate: before any compaction, the session must be chunked. This is user-driven (you decide what to chunk) but infrastructure-enforced (you can't compact without being prompted).

A second philosophical axis: user sovereignty over memory. The README explicitly positions RLM against cloud-dependent memory systems. All data lives under ~/.claude/rlm/ in human-readable JSON + gzip. No API keys, no cloud accounts.

Phased Development Roadmap

The project documents 9 completed phases and 1 planned:

Memory tools (remember/recall/forget/status)
Navigation tools (chunk/peek/grep/list)
Auto-chunking + sub-agent skills
Production (auto-summary, dedup, access tracking)
Advanced (BM25 search, fuzzy grep, multi-sessions, retention)
Production-ready (tests, CI/CD, PyPI)
MAGMA-inspired (temporal filtering, entity extraction)
Hybrid semantic search (BM25 + cosine, Model2Vec)
Typed chunking — chunk_type parameter
(Planned) Auto-memory/RLM cohabitation — Write/Edit hook redirects

Quotes

"3 lines to install. 14 tools. Zero configuration."

"User-driven philosophy: you decide when to chunk, the system saves before loss."

Architecture

RLM — Architecture

Distribution

Type: MCP server (Python package) + Claude Code skills + hooks
PyPI package: mcp-rlm-server
Version analyzed: latest (PyPI) / main branch

Install Methods

# Recommended
pip install mcp-rlm-server[all]

# Fast (no global pollution)
uv tool install mcp-rlm-server[all] --python 3.12

# Full install with hooks + skills
git clone https://github.com/EncrEor/rlm-claude.git && ./install.sh

# Docker
docker build -t rlm-server . && claude mcp add rlm-server -- docker run -i --rm -v ~/.claude/rlm/context:/data rlm-server

Required Runtime

Python 3.10+
Claude Code CLI
Optional: mcp-rlm-server[semantic] for Model2Vec embeddings (~35 MB)
Optional: mcp-rlm-server[semantic-fastembed] for FastEmbed (~230 MB)

Directory Tree (post-install)

~/.claude/rlm/
├── context/
│   ├── chunks/           # Active conversation chunks (JSON)
│   ├── archive/          # Archived chunks (gzip)
│   ├── index.json        # Chunk metadata index
│   ├── session_memory.json   # Current session insights
│   └── domains.json      # Project/domain configuration
└── hooks/
    ├── pre_compact_chunk.py
    ├── reset_chunk_counter.py
    ├── memory_write_redirect.py
    └── i18n.py

Repository Structure

rlm-claude/
├── src/mcp_server/       # MCP server source (Python)
│   └── tools/
│       └── fileutil.py   # Atomic I/O primitives
├── hooks/                # Claude Code hook scripts
├── templates/
│   ├── skills/rlm-analyze/skill.md
│   ├── skills/rlm-parallel/skill.md
│   ├── hooks_settings.json
│   └── CLAUDE_RLM_SNIPPET.md
├── context/              # Default data directory
│   ├── chunks/
│   └── domains.json.example
├── scripts/
│   ├── benchmark_providers.py
│   └── backfill_embeddings.py
├── pyproject.toml
├── Dockerfile
└── install.sh / uninstall.sh

Target AI Tools

Claude Code (primary)
Any MCP-compatible client (Docker deployment)

Data Storage

~/.claude/rlm/context/ — all persistent data (JSON + gzip)
Configurable via RLM_CONTEXT_DIR environment variable
SHA-256 content deduplication
File locking via fcntl.flock for concurrent access safety
Atomic writes (temp-then-rename)
Chunk size limit: 2 MB; archive decompression cap: 10 MB

Components

RLM — Components

MCP Tools (14 total)

Memory & Insights

Tool	Purpose
`rlm_remember`	Save decisions, facts, preferences with categories and importance levels
`rlm_recall`	Search insights by keyword (multi-word tokenized), category, or importance
`rlm_forget`	Remove an insight by ID
`rlm_status`	System overview: insight count, chunk stats, access metrics

Conversation History (Chunks)

Tool	Purpose
`rlm_chunk`	Save conversation segments with typed categorization (`snapshot`, `session`, `debug`; `insight` redirects to `rlm_remember`)
`rlm_peek`	Read a chunk (full or partial by line range)
`rlm_grep`	Regex search across all chunks (+ fuzzy matching for typo tolerance)
`rlm_search`	Hybrid search: BM25 + semantic cosine similarity (FR/EN, accent-normalized, chunks + insights)
`rlm_list_chunks`	List all chunks with metadata

Multi-Project Organization

Tool	Purpose
`rlm_sessions`	Browse sessions by project or domain
`rlm_domains`	List available domains for categorization

Smart Retention (3-zone lifecycle)

Tool	Purpose
`rlm_retention_preview`	Preview what would be archived (dry-run)
`rlm_retention_run`	Archive old unused chunks, purge ancient ones
`rlm_restore`	Bring back archived chunks

Retention zones: Active → Archive (.gz) → Purge
Immunity system: Critical tags, frequent access, and keywords protect chunks from archiving.

Claude Code Skills (2)

Skill	Purpose
`/rlm-analyze`	Analyze a single chunk with an isolated subagent (Explore type, read-only)
`/rlm-parallel`	Analyze multiple chunks in parallel — 3 subagents fan out, 1 Merger subagent synthesizes (Map-Reduce from MIT RLM paper)

Claude Code Hooks (3 hooks across 2 events)

Event	Matcher	Script	Purpose
`PreCompact`	`manual`	`pre_compact_chunk.py`	Block compaction, inject chunk-required message
`PreCompact`	`auto`	`pre_compact_chunk.py`	Same for auto-compact
`PostToolUse`	`mcp__rlm-server__rlm_chunk`	`reset_chunk_counter.py`	Track stats after chunk operations
`PostToolUse`	`Write`	`memory_write_redirect.py`	Detect writes to Claude auto-memory, nudge to use RLM
`PostToolUse`	`Edit`	`memory_write_redirect.py`	Same for Edit operations

Scripts (3)

Script	Purpose
`scripts/benchmark_providers.py`	Compare Model2Vec vs FastEmbed embedding quality
`scripts/backfill_embeddings.py`	Embed existing chunks after adding semantic search
`scripts/backfill_entities.py`	Extract entities from existing chunks (MAGMA Phase 7)

Templates

Template	Purpose
`templates/CLAUDE_RLM_SNIPPET.md`	CLAUDE.md insert fragment for Claude Code instructions
`templates/hooks_settings.json`	Ready-to-use hooks configuration
`templates/skills/rlm-analyze/skill.md`	Skill definition for single-chunk analysis
`templates/skills/rlm-parallel/skill.md`	Skill definition for parallel Map-Reduce analysis

Prompts

RLM — Prompt Files (Verbatim Excerpts)

Excerpt 1: `/rlm-parallel` Skill — Map Phase Prompt

Source: templates/skills/rlm-parallel/skill.md

Prompting technique: Few-shot parallel dispatch with explicit step decomposition + Map-Reduce pattern from MIT RLM paper arXiv:2512.24601. The skill specifies that exactly 3 Task tools must be launched in a single message to achieve true parallelism.

### Etape 3 : Analyse parallele (CRITIQUE)

Lancer **exactement 3 Task tools dans un seul message** pour execution parallele :

Task #1 : subagent_type="Explore", model="sonnet" → Analyse chunk 1 avec la question

Task #2 : subagent_type="Explore", model="sonnet" → Analyse chunk 2 avec la question

Task #3 : subagent_type="Explore", model="sonnet" → Analyse chunk 3 avec la question


**Prompt pour chaque sub-agent** :

---

Tu es un assistant d'analyse. Reponds a la question basee UNIQUEMENT sur ce chunk.

### Question
{question}

### Chunk {chunk_id}
{contenu du chunk}

### Instructions
- Extrais les informations pertinentes a la question
- Cite les passages cles entre guillemets si utile
- Si rien de pertinent, reponds "Pas d'information pertinente dans ce chunk"
- Sois concis (max 200 mots)

---

Excerpt 2: `/rlm-parallel` Skill — Merge Phase Prompt

Source: templates/skills/rlm-parallel/skill.md

Prompting technique: Hierarchical reduction (Merger subagent synthesizes 3 partial answers) with explicit contradiction detection and source attribution. Implements the "Reduce" step of Map-Reduce.

### Etape 5 : Fusion (Merger)

Lancer un Task final pour synthetiser :

Task Merger : subagent_type="Explore", model="sonnet"


**Prompt Merger** :

---

Tu es un synthetiseur. Combine ces analyses partielles en une reponse coherente.

### Question originale
{question}

### Analyses partielles
**Chunk {chunk_id_1}** :
{reponse_1}

**Chunk {chunk_id_2}** :
{reponse_2}

**Chunk {chunk_id_3}** :
{reponse_3}

### Instructions
- Synthetise les insights sans repetition
- Si des analyses se contredisent, signale-le : "Contradiction : ..."
- Cite les sources : [chunk_id] pour chaque fait
- Structure la reponse avec des bullet points si utile
- Max 400 mots

---

Excerpt 3: PreCompact Hook — Blocking System Message

Source: hooks/pre_compact_chunk.py

Prompting technique: Hook-injected systemMessage that creates a mandatory gate. The message is i18n-aware (French by default, English optional) and includes live context window percentage.

message = (
    f"[🔄 {t('compact_title')}]{ctx_info}\n"
    f"━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n\n"
    f"{t('compact_body')}"
)
result = {"systemMessage": message}
print(json.dumps(result))

The compact_body translation key expands to instructions requiring the user to run rlm_chunk before compaction proceeds, along with the current context percentage displayed as (ctx: N%).

Excerpt 4: `/rlm-analyze` Skill — Single-Chunk Analysis Prompt

Source: templates/skills/rlm-analyze/skill.md

Prompting technique: Constrained-context subagent pattern — subagent is given only the chunk content and forbidden from making assumptions beyond it.

**Etape 3** : Lance un subagent Task avec le contexte suivant :

---

Tu es un assistant d'analyse. Reponds a la question basee UNIQUEMENT sur le contexte fourni.

### Question
{La question de l'utilisateur}

### Contexte (Chunk {chunk_id})
{Contenu du chunk charge via rlm_peek}

### Instructions
- Reponds de maniere concise et precise
- Cite des passages pertinents si utile
- Si l'information n'est pas dans le contexte, dis-le clairement
- Ne fais pas de suppositions au-dela du contexte

---

Uniqueness

RLM — Uniqueness & Positioning

differs_from_seeds

RLM is closest to ccmemory in the seed set — both are MCP-anchored memory frameworks for Claude Code that persist facts across sessions. The key architectural delta: RLM uses flat JSON files + gzip archives under ~/.claude/rlm/ rather than ccmemory's Neo4j graph, enabling zero-infra deployment (no Docker, no database server). RLM adds a research-backed hybrid BM25+cosine search layer, a 3-zone retention lifecycle with immunity scoring, and the MIT RLM paper's Map-Reduce pattern as first-class skills. Unlike ccmemory's graph-traversal model, RLM thinks in "sessions and chunks" rather than "entities and relationships." RLM also differs from superpowers (skills-only behavioral framework) by being infrastructure — it doesn't change how Claude codes, it prevents context loss.

Positioning

RLM occupies a niche between ccmemory (graph memory) and simple CLAUDE.md file injection (agent-os). It is the only framework in this batch that:

Cites and implements a specific CS research paper (MIT arXiv:2512.24601) as its architecture
Ships both keyword and vector search with two provider choices and performance benchmarks
Has a 3-zone lifecycle (active/archive/purge) with an immunity system

Observable Failure Modes

Gate evasion: The PreCompact hook injects a message but cannot force Claude to call rlm_chunk — a determined agent or user could ignore it. The philosophy is user-driven; the system nudges but does not enforce.
Language mismatch: Skills are written in French (the project's default language). English users must parse French instructions or use the RLM_LANG=fr/en env var.
Semantic quality gap: The default Model2Vec provider overlaps only ~1.6/5 results with FastEmbed. Users relying on semantic search may get worse results than expected unless they switch providers.
No graph relationships: Chunks are discrete text blobs. For reasoning over connected facts (e.g., "how does decision A relate to person B?"), ccmemory's Neo4j graph is superior.
Manual chunking burden: The framework doesn't auto-chunk sessions; it only reminds users to do it. High-frequency users may find the prompting repetitive.

Cross-References

Implements: MIT CSAIL RLM paper (arXiv:2512.24601)
Inspired by: MAGMA (arXiv:2601.03236), Letta/MemGPT
Competes with: ccmemory (same use-case, different storage backend)

Workflow

RLM — Workflow

Session Lifecycle

Phase 1: Installation

pip install mcp-rlm-server[all] or ./install.sh
Register MCP server: claude mcp add rlm-server -- python3 -m mcp_server
Copy hooks and skills to ~/.claude/
Configure ~/.claude/settings.json with hook entries from templates/hooks_settings.json

Artifact: Running MCP server, configured hooks

Phase 2: During Session — Active Memory

User calls rlm_remember to persist facts (decisions, preferences, architecture notes)
User calls rlm_chunk to save conversation segments with type labels
rlm_search or rlm_recall to retrieve prior context
rlm_status to see memory health metrics

Artifact: ~/.claude/rlm/context/session_memory.json, chunk files in chunks/

Phase 3: Pre-Compaction Gate (automatic)

Claude triggers /compact or auto-compact fires
PreCompact hook intercepts, reads context window percentage
Injects blocking systemMessage requiring Claude to chunk before proceeding
Claude must call rlm_chunk (type=snapshot) to save current state
Only then does compaction proceed

Artifact: Snapshot chunk file, reset chunk counter

Phase 4: Post-Session Analysis (on demand)

/rlm-analyze <chunk_id> "<question>" — isolated Explore subagent answers question from chunk
/rlm-parallel "<question>" — 3 parallel Explore subagents, 1 Merger — Map-Reduce over top-3 relevant chunks

Artifact: Synthesized answer with source citations

Phase 5: Retention Management (periodic)

rlm_retention_preview — dry-run to see what would be archived
rlm_retention_run — move old chunks to .gz archive, purge ancient ones
rlm_restore <chunk_id> — bring back archived chunks

Artifact: Compressed archive in ~/.claude/rlm/context/archive/

Approval Gates

Gate	Type	Description
PreCompact injection	automatic	Blocking message forces chunk before context wipe
Manual chunk decision	user	User decides whether/what to chunk; no forced automatic chunking

Artifacts Per Phase

Phase	Artifact
Install	`~/.claude/settings.json` hook entries
Active memory	`~/.claude/rlm/context/session_memory.json` + chunk JSON files
Pre-compact	Snapshot chunk + stats update
Analysis	Synthesized Markdown response
Retention	`.gz` archives in `archive/` directory

Memory Context

RLM — Memory & Context

Memory Architecture

RLM implements two distinct memory subsystems:

Subsystem 1: Insights (Key-value facts)

Storage: ~/.claude/rlm/context/session_memory.json
Format: JSON with categories, importance levels, timestamps
Search: Multi-word tokenized keyword search via rlm_recall
Persistence: Global (cross-session, cross-project with domain filtering)

Subsystem 2: Chunks (Conversation history)

Storage: ~/.claude/rlm/context/chunks/ (JSON files, one per chunk)
Format: Plain text with typed metadata (chunk_type: snapshot/session/debug)
Search: Regex + fuzzy via rlm_grep; hybrid BM25+cosine via rlm_search
Persistence: Global with per-project domain organization

Retention / Lifecycle

Three-zone lifecycle:

Active — chunks/ directory, full JSON
Archive — archive/ directory, .gz compressed
Purge — permanent deletion

Immunity system: Chunks tagged as critical, frequently accessed, or matching protected keywords are immune from archiving.

Semantic Search (Optional)

Two embedding providers:

Provider	Model	Dimensions	Embed 108 chunks	Memory
Model2Vec (default)	`potion-multilingual-128M`	256	0.06s	0.1 MB
FastEmbed	`paraphrase-multilingual-MiniLM-L12-v2`	384	1.30s	0.3 MB

Hybrid BM25+cosine fusion: keyword matching + vector similarity. Graceful degradation to pure BM25 when semantic deps absent. Auto-embedding at chunk creation time.

Cross-Session Handoff

Primary mechanism: PreCompact hook forces a snapshot chunk before any compaction event.

After compaction, Claude can:

Call rlm_list_chunks to see recent history
Call rlm_peek to reload specific context
Call rlm_search to find relevant prior decisions

Multi-project support: Auto-detection of project from git or working directory. Domain tags enable cross-project filtering on all search tools.

State Files

File	Contents
`~/.claude/rlm/context/index.json`	Chunk metadata index
`~/.claude/rlm/context/session_memory.json`	Active insights
`~/.claude/rlm/context/chunks/*.json`	Individual chunks
`~/.claude/rlm/context/archive/*.json.gz`	Archived chunks
`~/.claude/rlm/context/domains.json`	Project/domain config

Context Compaction Handling

Yes — PreCompact hooks fire for both manual and auto compaction events. The hook reads context window percentage from stdin JSON and includes it in the blocking message (ctx: N%).

Orchestration

RLM — Orchestration

Multi-Agent

Yes — via the /rlm-parallel skill which spawns 4 subagents per invocation:

3 parallel Task-tool subagents (subagent_type="Explore") for chunk analysis (Map phase)
1 Task-tool Merger subagent for synthesis (Reduce phase)

Pattern: parallel-fan-out followed by hierarchical reduction. Directly implements the MIT RLM paper's Map-Reduce design.

Subagent Definition Format

task-tool-spawn — subagents are not defined as files; they are spawned inline by the /rlm-analyze and /rlm-parallel skill prompts using Claude Code's Task tool with subagent_type="Explore".

Orchestration Pattern

parallel-fan-out (for /rlm-parallel)
none (for all other operations — direct MCP tool calls)

Isolation Mechanism

Subagents are Explore-type (read-only, isolated context window of 200k tokens). No git worktree or container isolation.

Multi-Model

No — all subagents use Sonnet. No model routing or role-based model assignment.

Execution Mode

event-driven — the PreCompact hooks fire on Claude Code lifecycle events. MCP tools are invoked on-demand within sessions.

Crash Recovery

No — no documented crash recovery mechanism.

Streaming Output

No — MCP tools return complete responses.

Max Concurrent Agents

3 (the /rlm-parallel fan-out is hard-capped at 3 chunks to avoid token explosion).

Consensus Mechanism

None.

Ui Cli Surface

RLM — UI & CLI Surface

Dedicated CLI Binary

No dedicated CLI binary. Install and management are via pip install + standard claude mcp add commands. The install.sh / uninstall.sh scripts handle setup automation.

Local Web Dashboard

None. All interaction is through Claude Code's chat interface via MCP tool calls and slash commands.

Slash Commands (Claude Code)

Command	Description
`/rlm-analyze <chunk_id> "<question>"`	Analyze a single chunk with isolated Explore subagent
`/rlm-parallel "<question>" [chunk_ids...]`	Parallel Map-Reduce analysis over 3 chunks

IDE Integration

Claude Code only (via MCP + hooks). No Cursor, Copilot, or other editor support documented.

Observability

rlm_status tool returns system overview (insight count, chunk stats, access metrics)
Access tracking on chunks (immunity system uses access frequency)
PostToolUse hook on rlm_chunk maintains chunk counter stats
No log file or audit trail beyond the chunks themselves

Uninstall Surface

Interactive uninstall script with options:

./uninstall.sh — interactive (choose keep/delete data)
./uninstall.sh --keep-data — remove config, keep chunks
./uninstall.sh --all — remove everything
./uninstall.sh --dry-run — preview what would be removed

Docker Support

Docker image available for isolation or remote deployment:

docker build -t rlm-server .
claude mcp add rlm-server -- docker run -i --rm -v ~/.claude/rlm/context:/data rlm-server

Related frameworks

same archetype · same primary tool · same memory type

Taskmaster AI ★ 27k

A3 MCP-anchored

Converts a PRD into a dependency-ordered JSON task graph that AI coding agents execute one task at a time, eliminating context…

ccmemory ★ 1

A3 MCP-anchored

Accumulates decisions, corrections, and failed approaches from Claude Code sessions into a queryable Neo4j graph so each new…

Pimzino spec-workflow-mcp ★ 4.2k

A3 MCP-anchored

MCP server providing spec-driven development workflow with dashboard-backed approval gates, implementation logging, and VSCode…

MCP Shrimp Task Manager ★ 2.1k

A3 MCP-anchored

Convert natural language requests into structured AI development tasks with chain-of-thought enforcement, reflection gates, and…

Bernstein ★ 460

A3 MCP-anchored

Govern parallel CLI coding agents with a deterministic Python scheduler, HMAC-chained audit trail, and compliance-ready signed…

LeanSpec ★ 252

A3 MCP-anchored

Provides a unified spec CLI and MCP server over any existing spec backend (markdown, GitHub Issues, ADO), making spec-driven…

Distribution

Type: mcp-server
License: MIT
Install: multi-step
Version: latest (PyPI mcp-rlm-server, main branch 2026-03-19)

Surfaces

CLI binary: No
CLI subcmds: 0
Local UI: No

Components

Commands: 2
Skills: 2
Subagents: 4
Hooks: 5
MCP servers: 1
MCP tools: 14
Scripts: 3
Templates: 4

Workflow

Phases: 5
Approval gates: 1
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: Yes
Pattern: parallel-fan-out
Max concurrent: 3
Isolation: none
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: No
BYOK: No
Locked to: claude-sonnet
Modal: text

Execution

Mode: event-driven
Crash recovery: No
Compaction: Yes
Session handoff: Yes
Streaming: No

Memory

Type: hybrid
Persistence: global
Search: hybrid
State files: 5 files

Quality

TDD: No
TDD mechanism: none
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: No
Audit format: none
Replay: Yes

Tools

Primary: claude-code
Targets: 1
Portability: low

Signals

Stars: 35
Last commit: 2026-03-19
Contributors: 2
Maintainer: active
Quality score: 2.5/10

Summary

RLM (Infinite Memory) — Summary

Overview

RLM — Origin & Philosophy

Origin

Philosophy

Phased Development Roadmap

Quotes

Architecture

RLM — Architecture

Distribution

Install Methods

Required Runtime

Directory Tree (post-install)

Repository Structure

Target AI Tools

Data Storage

Components

RLM — Components

MCP Tools (14 total)

Memory & Insights

Conversation History (Chunks)

Multi-Project Organization

Smart Retention (3-zone lifecycle)

Claude Code Skills (2)

Claude Code Hooks (3 hooks across 2 events)

Scripts (3)

Templates

Prompts

RLM — Prompt Files (Verbatim Excerpts)

Excerpt 1: /rlm-parallel Skill — Map Phase Prompt

Excerpt 2: /rlm-parallel Skill — Merge Phase Prompt

Excerpt 3: PreCompact Hook — Blocking System Message

Excerpt 4: /rlm-analyze Skill — Single-Chunk Analysis Prompt

Uniqueness

RLM — Uniqueness & Positioning

differs_from_seeds

Positioning

Observable Failure Modes

Cross-References

Workflow

RLM — Workflow

Session Lifecycle

Phase 1: Installation

Phase 2: During Session — Active Memory

Phase 3: Pre-Compaction Gate (automatic)

Phase 4: Post-Session Analysis (on demand)

Phase 5: Retention Management (periodic)

Approval Gates

Artifacts Per Phase

Memory Context

RLM — Memory & Context

Memory Architecture

Subsystem 1: Insights (Key-value facts)

Subsystem 2: Chunks (Conversation history)

Retention / Lifecycle

Semantic Search (Optional)

Cross-Session Handoff

State Files

Context Compaction Handling

Orchestration

RLM — Orchestration

Multi-Agent

Subagent Definition Format

Orchestration Pattern

Isolation Mechanism

Multi-Model

Execution Mode

Crash Recovery

Streaming Output

Max Concurrent Agents

Consensus Mechanism

Ui Cli Surface

RLM — UI & CLI Surface

Dedicated CLI Binary

Local Web Dashboard

Slash Commands (Claude Code)

IDE Integration

Observability

Uninstall Surface

Excerpt 1: `/rlm-parallel` Skill — Map Phase Prompt

Excerpt 2: `/rlm-parallel` Skill — Merge Phase Prompt

Excerpt 4: `/rlm-analyze` Skill — Single-Chunk Analysis Prompt