Entroly

Primitive shape 1 total

Hooks 1

Summary

Entroly — Summary

Entroly is a Python + Rust context compression engine that operates as an HTTP reverse proxy in front of LLM APIs (Anthropic, OpenAI, Gemini) and as an MCP server. It intercepts API calls, applies context selection (knapsack 0/1 DP, BM25, Shannon entropy, SimHash dedup) to reduce input tokens, and optionally runs WITNESS — a $0/2ms hallucination detection system that scored AUROC 0.80 on HaluEval-QA, statistically tying GPT-4o-mini. Token savings: README claims "70-95% tested on large-repo release checks"; self-test on the Entroly codebase measures 87% average savings (96.7% at 32K budget, 99.1% at 8K budget).

The architecture has two layers: Python orchestration (MCP protocol, HTTP proxy, CLI, flow) and Rust computation (entroly-core, bound via PyO3/maturin: knapsack, entropy, BM25, SimHash, dependency graph, PRISM RL loop, static security scan). PRISM is a reinforcement loop that learns fragment→outcome mappings, shifting compression weights over sessions. A single PostToolUse Claude Code hook (ravs capture) feeds every tool outcome into the RAVS event log for Bayesian routing.

Compared to seeds: Entroly operates at the LLM API proxy level (intercepts HTTP calls before they reach Anthropic/OpenAI), unlike lean-ctx (wraps tool invocations), CogniLayer (compresses subagent context via MCP), or CSR (captures past conversation history). The WITNESS hallucination detection capability is unique in this batch. Entroly also ships a web dashboard (entroly dashboard), a --kiro integration folder, and 37+ wrap targets across CLI tools.

Overview

Entroly — Overview

Origin

juyterman1000. Apache-2.0 license. Python + Rust (PyO3/maturin). Version 1.0.6. Available on PyPI, npm (WASM), Homebrew. Active (last commit 2026-05-26). 1 contributor.

Philosophy

From README:

"Entroly compresses AI context, detects hallucinations, and saves up to 80% on OpenAI, Anthropic, and Gemini API bills. Works locally with Cursor, Claude Code, Codex, GPT & custom providers."

"Audit AI answers against supplied evidence. Cut large-repo context 70-95% in release checks when retrieval has room to work."

Core differentiators:

API proxy level — not a tool wrapper; intercepts the HTTP stream before the LLM sees it
WITNESS — hallucinaton detection at $0/2ms (deterministic PAV, no LLM call)
PRISM RL loop — learns which fragments worked well and adjusts compression weights
Reproducible benchmarks — JSON result files committed to repo, reproducible via one command

Key Token Savings Claims (verbatim from README)

"Token Savings: tested 70-95% on large-repo release checks" (badge)
"Save 80% on API bills" (headline)
Self-test on Entroly codebase (entroly verify-claims):
- 96.7% savings at 32K budget
- 99.1% savings at 8K budget
- 87.0% average savings

WITNESS Hallucination Detection

Measured under HaluEval-QA standard protocol (20K decisions):

AUROC: 0.80
Accuracy: 84.9% ± 0.6%
Cost: $0, ~2ms/decision
Statistically ties GPT-4o-mini (86.3% on same sample)

Architecture

Entroly — Architecture

Distribution

PyPI: pip install entroly
npm: npm i -g entroly-wasm (WASM version)
Homebrew: brew tap juyterman1000/entroly && brew install entroly
Docker: docker-compose.yml

Two-Layer Architecture

Python (entroly/)
├── server.py        # MCP server (thin Rust wrapper)
├── proxy.py         # HTTP reverse proxy (port 9377)
├── cli.py           # 40+ CLI commands
├── sdk.py           # compress() / compress_messages()
├── epistemic_router.py  # 5 flow selector
├── flow_orchestrator.py # Pipeline execution
└── vault.py         # Persistent learning store

Rust (entroly-core/src/)
├── knapsack.rs      # 0/1 DP token budget solver
├── entropy.rs       # Shannon entropy scoring
├── semantic_dedup.rs # SimHash O(1) dedup
├── bm25.rs          # TF-IDF + BM25
├── depgraph.rs      # Import/dependency resolution
├── prism.rs         # RL loop (fragment→outcome)
├── cogops.rs        # Unified engine
├── sast.rs          # Static security (55 rules)
└── archetype.rs     # Role-based context presets

State Files

vault/beliefs/ — durable code-entity understanding
vault/verification/ — challenge tracking
vault/actions/ — task outputs, PR briefs
vault/evolution/skills/ — skill specs with fitness metrics
.entroly_verification.json — verify-claims output
.claude/settings.json — PostToolUse RAVS hook

Required Runtime

Python 3.10+
Rust (for dev; pre-built via PyPI)
Optional: OPENAI_API_KEY for NLI verification in WITNESS

Target AI Tools

Claude, Cursor, Codex, Aider, Copilot, Gemini, Qwen, OpenCode; any proxy-aware HTTP client.

Components

Entroly — Components

MCP Tools (12, from server.py)

remember_fragment, optimize_context, recall_relevant, record_outcome, record_test_result, record_command_exit, record_ci_result, explain_context, checkpoint_state, resume_state, prefetch_related, get_stats

CLI Commands (40+, from cli.py)

init, serve, dashboard, health, autotune, proxy, daemon, benchmark, status, config, telemetry, clean, export, import, drift, profile, batch, wrap, learn, go, demo, share, doctor, digest, migrate, role, completions, optimize, feedback, compile, verify-code, verify, verify-claims, sync, search, docs, finetune, witness, ravs

Hook (1 in .claude/settings.json)

Event	Matcher	Command	Purpose
`PostToolUse`	`Bash\|Read\|Grep\|Glob\|Edit\|Write\|TodoWrite`	`entroly ravs capture --stdin --quiet`	Feed every tool outcome into RAVS event log

Plugin

.claude-plugin/manifest.json v1.0.6 — Claude Code plugin with MCP server (entroly serve)

Kiro Integration

.kiro/steering/ — Kiro IDE steering files (exact content not retrieved; shows multi-IDE support)

WITNESS

CLI: entroly witness --context-file evidence.txt --output-file answer.txt --mode strict Proxy mode: entroly proxy --witness audit / --witness strict

Profiles: rag, qa, benchmark_qa, code, summary, chat, dialogue

PRISM RL Loop

Tracks fragment→outcome mappings across sessions:

record_outcome MCP tool — record success/failure
record_test_result / record_command_exit / record_ci_result — automatic outcome signals
PRISM shifts compression weights based on accumulated feedback

Epistemic Router (5 Flows)

Fast Answer — fresh beliefs, act immediately
Verify Before Answer — stale beliefs, recompile + verify
Compile On Demand — no beliefs, index + extract + verify
Change-Driven — PR/commit trigger, blast radius analysis
Self-Improvement — repeated failures → skill synthesis

Prompts

Entroly — Prompts

Prompt File 1: CLAUDE.md

Technique: Architecture reference card for Claude Code contributors working on the Entroly codebase itself.

Key excerpt:

### Epistemic Router (5 Flows)
`epistemic_router.py` selects which pipeline runs for each query:
1. Fast Answer — beliefs are fresh, act immediately
2. Verify Before Answer — beliefs are stale, recompile + verify first
3. Compile On Demand — no beliefs exist, index + extract + verify
4. Change-Driven — triggered by PR/commit, analyzes blast radius
5. Self-Improvement — repeated failures trigger skill synthesis → promote/prune

Technique used: Internal architecture documentation as agent guide; numbered flow options create a decision tree that the agent internalizes.

Prompt File 2: .claude/settings.json PostToolUse Comment

Technique: In-JSON comment as rationale for the hook, readable by agents inspecting the settings:

{
  "_comment": "Feed every tool outcome into the RAVS event log. Powers `entroly ravs report` and the Bayesian routing layer. Quiet + best-effort: a missing entroly binary or a slow capture never blocks Claude Code."
}

Technique used: Rationale comment in config file — self-documenting hook with clear failure mode (quiet + never blocks).

Prompt File 3: WITNESS Proxy Response Header

Technique: Certificate header injection as runtime prompt augmentation.

X-Entroly-Witness-Id: <cert-id>   # In every proxied response
# Full certificate: curl http://localhost:9377/witness/{id}

Technique used: Out-of-band evidence channel — the model's response is extended with a certificate that can be inspected programmatically, creating an audit trail without adding tokens to the main context.

Uniqueness

Entroly — Uniqueness

Differentiator

Entroly operates at the HTTP proxy layer, intercepting actual Anthropic/OpenAI API calls before they reach the model. This is architecturally distinct from every other framework in the memory/context batch, which operate at the tool, hook, or MCP level and only see what the agent explicitly passes them.

The proxy intercepts are invisible to the agent: context is compressed, WITNESS certificates are attached to responses, and RAVS telemetry is captured without the agent issuing any special calls.

vs. ccmemory (Seed)

Dimension	ccmemory	Entroly
Memory store	Neo4j graph (typed nodes)	Flat vault/ (beliefs, skills)
Insertion mechanism	4 lifecycle hooks detect events via LLM	PostToolUse hook → RAVS event log
Retrieval	Cypher graph traversal	BM25 + SimHash dedup (Rust)
Context budget	No token budget solver	0/1 knapsack DP (Rust knapsack.rs)
Hallucination check	None	WITNESS (NLI, AUROC 0.80, $0, 2ms)
Compression	Not a compression layer	Core feature (70–95% token reduction)
Interception layer	Hook-level (Claude Code hooks)	HTTP proxy (port 9377)
Self-improvement	None	PRISM RL loop, Epistemic Router flow 5

vs. Batch Peers

lean-ctx: Both do aggressive compression. lean-ctx uses CCP cross-session carry; entroly uses PRISM RL to learn which fragments matter. lean-ctx is MCP-only; entroly adds HTTP proxy interception.
claude-self-reflect (CSR): Both use PostToolUse hooks + SQLite. CSR focuses on reflection/quality (9.3x quality claim); entroly focuses on compression economics (70-95% savings). CSR has no proxy mode.
basic-memory: Both are multi-IDE. basic-memory is a knowledge graph (entities/relations); entroly is a compression engine with no graph model.
cognilayer: Both have CLI + MCP + hooks. CogniLayer targets multi-agent orchestration (21 orchestration tools); entroly targets token cost reduction.
iwe: Both are Rust-based. iwe is a note graph IDE extension with no agent hooks; entroly is an agent middleware with no document editing model.

Unique Capabilities Not Found in Seeds or Batch Peers

Proxy-level interception: Only framework that works by routing HTTP calls through a sidecar, making it the only one that can compress context from any AI tool, not just Claude Code.
WITNESS hallucination certificates: Attaches verifiable NLI evidence certificates to every proxied response, creating an audit trail at $0/2ms.
PRISM RL loop: Reinforcement learning on fragment→outcome mappings across sessions; weights shift based on test pass/fail, command exit codes, and CI results.
Knapsack DP token budget solver: Hard token budget constraint solved via 0/1 dynamic programming in Rust, not heuristic truncation.
Dual runtime (Python+Rust via PyO3): Hot paths (entropy, BM25, SimHash, dedup) compiled to native Rust; Python manages orchestration and MCP protocol.

Workflow

Entroly — Workflow

Phases

Phase	What Happens	Artifact
Install	`pip install entroly`	CLI + Rust engine available
Onboard	`entroly go` — auto-detect IDE, generate config	MCP + proxy configured
Active (MCP)	Agent calls `optimize_context(budget)` → Rust engine selects fragments	Token-budgeted context sent to LLM
Active (Proxy)	Agent HTTP calls routed through `entroly proxy` → compressed before reaching Anthropic/OpenAI	API costs reduced
Feedback	`record_outcome`, `record_test_result` → PRISM updates weights	vault/ updated with fragment success data
WITNESS	`entroly proxy --witness audit` → all responses checked against evidence	Certificate for each response
Self-Improvement	Repeated failures → Epistemic Router flow 5 → skill synthesis	`vault/evolution/skills/` updated
Verify	`entroly verify-claims` → self-test compression on current repo	`.entroly_verification.json`

RAVS Event Log

PostToolUse hook ravs capture --stdin --quiet feeds all tool outcomes (Bash/Read/Edit/Write) into RAVS. Powers entroly ravs report and Bayesian routing.

Proxy Mode Details

entroly proxy                          # Standard proxy on :9377
entroly proxy --witness audit          # + sidecar WITNESS certificates
entroly proxy --witness strict         # + suppress unsupported claims

WITNESS certificates retrievable at http://localhost:9377/witness/{id}.

Approval Gates

None.

Spec Format

None.

Memory Context

Entroly — Memory & Context

Memory Model

sqlite + file-based (vault) — dual store:

vault/beliefs/ — code-entity understanding with confidence and staleness tracking (durable)
vault/verification/ — challenge and staleness records
vault/actions/ — task outputs and PR briefs
vault/evolution/skills/ — learned compression skill specs with test fitness metrics

Rust engine maintains in-memory PRISM model (fragment→outcome mappings).

Persistence Scope

project — vault lives in the project directory; PRISM model persists across sessions within a project.

Core Compression Algorithms (Rust)

Algorithm	Purpose
Knapsack 0/1 DP	Token budget solver with (1-1/e) approximation guarantee
Shannon entropy	Information density per token (high entropy = keep)
BM25	TF-IDF + BM25 relevance ranking
SimHash	O(1) duplicate detection
Dependency graph	Cross-file import/dep resolution
PRISM RL	Reinforcement loop, fragment→outcome weights

Context Compaction Handling

yes — checkpoint_state MCP tool saves state; resume_state restores from checkpoint. Works across both MCP and proxy modes.

Cross-Session Handoffs

yes — via vault beliefs and PRISM weights. Each session builds on learned fragment value from previous sessions.

Token Reduction Claims (verbatim from README)

Badge: "Token Savings: tested 70-95% on large-repo release checks"
Self-test (entroly verify-claims on Entroly's own codebase):
- 96.7% savings at 32K budget
- 99.1% savings at 8K budget
- 87.0% average savings

From benchmark table:

Benchmark	Token Savings
NeedleInAHaystack	99.5%
LongBench (HotpotQA)	85.3%
Berkeley Function Calling	79.1%
SQuAD 2.0	37.7%

Search Mechanism

hybrid — BM25 retrieval + SimHash dedup + dependency graph; no vector embeddings (deterministic algorithms only by default). Optional OpenAI NLI for WITNESS enhanced verification.

WITNESS Hallucination Detection

Memory about facts is verified at response time:

Deterministic PAV local verifier: $0, ~2ms/decision
AUROC 0.80 on HaluEval-QA (20K decisions)
Optional NLI via OpenAI API when key present

State Files

vault/beliefs/ — beliefs with confidence + staleness
vault/verification/ — challenge records
vault/actions/ — task outputs
vault/evolution/skills/ — learned skill specs
.entroly_verification.json — verify-claims output

Orchestration

Entroly — Orchestration

Multi-Agent

No native multi-agent protocol. Can compress context for multi-agent flows, but ships no agent coordination tools.

Orchestration Pattern

none — single-agent context compression. The Epistemic Router's 5 flows are internal pipeline routing, not multi-agent orchestration.

Isolation Mechanism

none

Execution Mode

event-driven (proxy intercepts API calls) + interactive-loop (MCP tools) + background-daemon (entroly daemon for background processing).

Multi-Model

yes — the HTTP proxy works with Anthropic, OpenAI, and Gemini. Different models can be configured at the proxy level, though Entroly itself doesn't route different roles to different models.

Context Compaction Handling

yes — checkpoint_state / resume_state MCP tools.

Crash Recovery

yes — checkpoint/resume pattern.

Auto Validators

yes — WITNESS hallucination detection acts as an auto-validator on LLM outputs (when proxy mode is active with --witness flag).

Prompt Chaining

yes — Epistemic Router flow 2 (Verify Before Answer) chains: stale belief detection → recompile → verify → then answer. Each stage's output triggers the next.

Self-Improvement Loop

Epistemic Router flow 5: repeated failures → skill_synthesis → spec with test cases → fitness measurement → promote/prune. This is a form of automated prompt improvement.

Ui Cli Surface

Entroly — UI & CLI Surface

CLI Binary

Name: entroly
Type: Own runtime (Python + Rust), not a thin wrapper
Subcommands (40+): init, serve, dashboard, health, autotune, proxy, daemon, benchmark, status, config, telemetry, clean, export, import, drift, profile, batch, wrap, learn, go, demo, share, doctor, digest, migrate, role, completions, optimize, feedback, compile, verify-code, verify, verify-claims, sync, search, docs, finetune, witness, ravs

Local UI (Browser Dashboard)

Launch: entroly dashboard or entroly go (auto-opens browser)
Live demo: https://juyterman1000.github.io/entroly/docs/dashboard.html
Features: token savings tracking, WITNESS certificate browser, PRISM weight visualization, context quality improvement, compression stats
Port: served via HTTP proxy on :9377

WITNESS Certificate API

curl http://localhost:9377/witness/{id}            # Full proof path + evidence
curl http://localhost:9377/witness?limit=10         # Recent certificates
POST http://localhost:9377/witness/{id}/feedback   # Label false positives

Observability

entroly health — codebase health grade (A-F)
entroly digest — session digest
entroly ravs report — RAVS event log summary (Bayesian routing data)
entroly verify-claims — self-test compression on current repo → .entroly_verification.json
Dashboard: token savings, WITNESS certs, PRISM weights, context quality over time

Transport

MCP (stdio for Claude Code plugin) + HTTP proxy (:9377 for API interception) + HTTP MCP (lean-ctx serve-style endpoint).

Hugging Face Live Demo

https://huggingface.co/spaces/entroly/entroly-context-compression — try compression without installing.

Related frameworks

same archetype · same primary tool · same memory type

MemPalace ★ 53k

A10 Memory engine

Verbatim local-first AI memory with 96.6% R@5 retrieval on LongMemEval using zero API calls — structured into a palace hierarchy…

Beads (Yegge) ★ 24k

A10 Memory engine

Dolt-powered distributed graph issue tracker where AI agents track tasks with hierarchical IDs and dependency edges, claim work…

deepagents (LangChain) ★ 23k

A10 Memory engine

Opinionated Python agent harness on top of LangGraph with sub-agents, filesystem, memory, and context compaction bundled in

agentmemory ★ 18k

A10 Memory engine

Persistent, searchable memory for AI coding agents that captures every tool interaction, compresses it via LLM, and injects…

Open Multi-Agent ★ 6.3k

A10 Memory engine

Give a natural-language goal to a coordinator agent and get a dynamically decomposed, parallelized task DAG executed by…

Basic Memory ★ 3.1k

A10 Memory engine

Gives AI agents a persistent, human-readable knowledge graph of project decisions, observations, and relations stored as plain…