swe-pruner — Summary
One-line: Neural context pruner that removes irrelevant code tokens before they reach the LLM, cutting 23–54% of tokens on SWE-Bench Verified with a 0.6B fine-tuned model.
Identity
| Field | Value |
|---|---|
| GitHub | https://github.com/ByteDance-Seed/SWE-Pruner |
| Stars | 282 |
| License | None declared |
| Language | Python |
| Version | (no tag; branch: public) |
| Package type | Standalone research repo (FastAPI server) |
| Maintainer org | ByteDance Seed (research) |
What It Does
swe-pruner serves a FastAPI endpoint (port 8000) that accepts a context payload and returns a pruned version, stripping code chunks the model predicts are irrelevant to the query. The pruner model (code-pruner, 0.6B parameters) is fine-tuned and hosted on HuggingFace (ayanami-kitasan/code-pruner). Agents call the /prune endpoint directly; there is no MCP server, no Claude hook, and no vault.
Claimed Results
- "Make Claude Tokens 40% Saving!" (badge, verbatim)
- "23–54% token reduction on SWE-Bench Verified"
- "up to 14.84x compression on LongCodeQA"
- Paper: arXiv:2601.16746 (ByteDance Seed)
Archetype
Research paper implementation — FastAPI inference server, not a Claude plugin or MCP server. Closest to a preprocessing layer that any agent framework can call via HTTP.