Skip to content
/
Phase B Batch 12

Batch 12 — Memory + Context Engineering (Compaction, Knowledge Graphs, Pruning)

Batch 12 — Memory + Context Engineering (Compaction, Knowledge Graphs, Pruning)

Roster (10)

slug stars distribution cli_binary local_ui memory_type compression tier
basic-memory 3087 mcp-server bm/basic-memory (15 cmds) yes (React, cloud) sqlite+hybrid none A
claude-self-reflect 214 claude-plugin csr-engine no sqlite+vector none (quality focus) A
claude-supermemory 2584 claude-plugin none no cloud-API (proprietary) none A
cognilayer 28 standalone-repo cognilayer (TUI) no sqlite session compaction A
symdex 190 mcp-server symdex (10 cmds) no sqlite none (retrieval only) A
kratos-mcp 34 npm-package none no sqlite (FTS only) none C (LEGACY)
iwe 1086 cli-tool iwe+iwes+iwec (18 cmds) no file-based (Markdown) none A
lean-ctx 2186 mcp-server lean-ctx (20 cmds) yes (browser, :9377) hybrid+file-based 60-99% (heuristic) A
entroly 398 claude-plugin entroly (40+ cmds) yes (browser, :9377) hybrid vault/ 70-95% (knapsack DP+BM25) A
swe-pruner 282 standalone-repo swe-pruner no none 23-54% (neural 0.6B model) B

Intra-batch Patterns

1. Divergent Architectures for the Same Problem

All 10 frameworks aim to improve LLM effectiveness via context engineering, but they operate at completely different layers:

  • Storage layer (basic-memory, iwe, kratos-mcp): Build a knowledge graph or document store; agent retrieves what it needs.
  • Inference preprocessing (swe-pruner, lean-ctx, entroly): Filter or compress context before the LLM call.
  • Proxy layer (entroly): Intercept the actual HTTP API call; invisible to the agent.
  • Session quality (claude-self-reflect, cognilayer): Improve quality over time via reflection/RL, not raw compression.
  • Cloud delegation (claude-supermemory): Offload memory entirely to a hosted API.
  • Structural indexing (symdex): Index code structure (AST, imports) for precision retrieval.

2. Three Distinct Compression Philosophies

Philosophy Representatives Mechanism
Heuristic selection lean-ctx BM25 + entropy scoring + token budget
Mathematical optimization entroly 0/1 knapsack DP on entropy scores
Neural classification swe-pruner Fine-tuned 0.6B model per-chunk relevance

3. Token Reduction Claims Spread

  • swe-pruner: 23-54% (SWE-Bench Verified, peer-reviewed paper)
  • entroly: 70-95% (self-measured, verify-claims)
  • lean-ctx: 60-99% (benchmark suite, reproducible via benchmarks.md)
  • The highest claims (99%+) come from frameworks with the least external validation; swe-pruner has the most rigorous evaluation methodology (arXiv paper, SWE-Bench).

4. Cross-Tool Portability Gradient

lean-ctx and swe-pruner have the highest cross_tool_portability (HTTP interfaces, no Claude lock-in). entroly is nominally multi-model via proxy but ships as a Claude plugin first. basic-memory is explicitly positioned as IDE-agnostic but has a freemium wall for team features. iwe is the only framework built as a writing/notes tool that gained agent capabilities rather than the reverse.

Most Interesting Finds

  1. entroly — HTTP proxy-level interception: The only framework in the batch (and likely in the entire corpus) that operates at the API proxy layer. By routing calls through :9377, entroly compresses context and attaches WITNESS hallucination-detection certificates to every response, creating an audit trail at $0/2ms without the agent issuing any special calls. The combination of knapsack DP (Rust), PRISM RL loop, and WITNESS NLI checking (AUROC 0.80 on HaluEval-QA) in a single package is architecturally novel. The Python+Rust hybrid (PyO3) with hot paths in native Rust is the most sophisticated runtime in the batch.

  2. lean-ctx's LEAN-CTX.md constraint: The most aggressive tool-override rule found in the batch — "CRITICAL: NEVER use native Claude Code Read/Grep/Shell tools" — forces the agent to route all file access through lean-ctx's 62 MCP tools, ensuring every context fragment passes through the compression layer. This is the clearest example in the batch of a framework using its prompt file as an architectural enforcement mechanism.

  3. swe-pruner — Neural vs. heuristic gap: The only framework with a peer-reviewed paper and SWE-Bench numbers. The 14.84x compression claim on LongCodeQA is the highest single-benchmark ratio in the batch, but the CUDA requirement and absent license make it research-prototype-only in practice.

Items Written as Tier C

  • kratos-mcp — README explicitly states "This repo is now legacy. Please use ceorkm/kratos-cli." Last meaningful commit 2024. No MCP registration, no active maintainer. Written as full 11-file report (code still works) but marked tier: C (LEGACY).
  • swe-pruner — Tier B: has a paper and reproducible benchmarks, but no license file, no version tags, CUDA-only requirement, and no native Claude integration. Demoted from A.

Cross-References Discovered

  • lean-ctx and entroly both serve browser dashboards on port :9377 — naming collision if both run simultaneously on the same machine.
  • symdex could feed swe-pruner: symdex retrieves structurally-relevant code chunks; swe-pruner prunes those chunks by neural relevance. They are naturally complementary preprocessing stages.
  • kratos-mcp successor is ceorkm/kratos-cli (not in this batch — flagged for a future batch if evaluated).
  • claude-supermemory is built on Supermemory.ai API (mem0.ai-adjacent space); if Supermemory.ai discontinues, the plugin stops working — pure cloud dependency unlike every other framework in the batch.
  • basic-memory AGPL-3.0 license means any system that bundles it must also be AGPL — unique licensing risk in the batch; all others are MIT, Apache-2.0, or unlicensed.
  • cognilayer v4.3.0 claims 21 multi-agent orchestration tools (more than any other framework here) and 5 lifecycle hooks including PreCompact — the most complete lifecycle coverage in the batch alongside ccmemory (seed).