Entroly — Summary
Entroly is a Python + Rust context compression engine that operates as an HTTP reverse proxy in front of LLM APIs (Anthropic, OpenAI, Gemini) and as an MCP server. It intercepts API calls, applies context selection (knapsack 0/1 DP, BM25, Shannon entropy, SimHash dedup) to reduce input tokens, and optionally runs WITNESS — a $0/2ms hallucination detection system that scored AUROC 0.80 on HaluEval-QA, statistically tying GPT-4o-mini. Token savings: README claims "70-95% tested on large-repo release checks"; self-test on the Entroly codebase measures 87% average savings (96.7% at 32K budget, 99.1% at 8K budget).
The architecture has two layers: Python orchestration (MCP protocol, HTTP proxy, CLI, flow) and Rust computation (entroly-core, bound via PyO3/maturin: knapsack, entropy, BM25, SimHash, dependency graph, PRISM RL loop, static security scan). PRISM is a reinforcement loop that learns fragment→outcome mappings, shifting compression weights over sessions. A single PostToolUse Claude Code hook (ravs capture) feeds every tool outcome into the RAVS event log for Bayesian routing.
Compared to seeds: Entroly operates at the LLM API proxy level (intercepts HTTP calls before they reach Anthropic/OpenAI), unlike lean-ctx (wraps tool invocations), CogniLayer (compresses subagent context via MCP), or CSR (captures past conversation history). The WITNESS hallucination detection capability is unique in this batch. Entroly also ships a web dashboard (entroly dashboard), a --kiro integration folder, and 37+ wrap targets across CLI tools.