Skip to content
/

SymDex

symdex · husnainpk/SymDex · ★ 190 · last commit 2026-04-22

Primitive shape 22 total
Skills 1 MCP tools 21
00

Summary

SymDex — Summary

SymDex is a repo-local symbolic indexing engine for AI coding agents. It maps a project into exact symbols, file outlines, HTTP routes, caller/callee graphs, and token-budgeted context packs, stored in a local SQLite database (~/.symdex/ or ./.symdex/). It ships 21 MCP tools and a CLI with matching subcommands, covering 21 language surfaces. The core value proposition is "exact retrieval instead of file scanning": get_symbol returns one function's source instead of a whole file; build_context_pack assembles a token-budgeted evidence bundle for a feature question.

SymDex includes a dedicated symdex-code-search agent skill (installable via npx skills add) that instructs supported AI agents to search with SymDex before broad Read/Grep/Glob exploration. Every CLI search command and MCP tool output includes ROI metadata (roi, roi_summary, roi_agent_hint) so agents can report token savings in their responses. Semantic search backends include local sentence-transformers, Voyage, OpenAI-compatible, and Gemini Embedding — all optional; the base install is lean with no embedding dependency.

Compared to seeds, SymDex is closest to the ccmemory concept of providing a structured query layer over code, but it is a code structure index (not a conversation memory system): it indexes symbols, routes, and call graphs rather than storing decisions or conversation history. Unlike CogniLayer (which also does tree-sitter code intelligence), SymDex is a pure indexing/retrieval tool with no memory of facts, no hooks, and no session management.

01

Overview

SymDex — Overview

Origin

Husnain Khalid (husnainpk). MIT license. Version 0.1.26. Python 3.x, PyPI package symdex. Active (last commit 2026-04-22).

Philosophy

From README:

"SymDex turns a checked-out repository into a local symbolic index of code structure, routes, relations, docs, tests, and context so AI coding agents can reason, retrieve, and act without reading whole files first."

"AI coding agents are useful until they have to rediscover your repo from scratch. They open whole files, grep broad patterns, miss the route handler, read the same utility twice, and spend thousands of tokens just getting oriented."

Core tenets:

  • Exact symbols, not file contentsget_symbol returns the exact function with byte offset, not the whole file
  • Token-budgeted context packsbuild_context_pack assembles just enough evidence for a query within a token budget
  • ROI-first design — every search returns roi, roi_summary, roi_agent_hint so agents can quantify and communicate token savings
  • Local-first, lean by default — base install has zero embedding dependency; semantic search is opt-in
  • Multi-repo awareness — a registry supports multiple repos and worktrees; workspace-local .symdex for Docker/teams

Design Choices

  • Lean core vs full: pip install symdex installs the lean core (no embeddings); pip install "symdex[local]" adds sentence-transformers.
  • Lazy indexing: symdex index --lazy does foreground structural indexing with background embedding fill, so agents can start immediately.
  • Quality metadata: every result includes quality fields (confidence, freshness, parser mode, generated-file hints, embedding availability) so agents can judge reliability.
  • Agent skill: the symdex-code-search SKILL.md is the behavioral layer that trains agents to use SymDex first.
02

Architecture

SymDex — Architecture

Distribution

PyPI package: pip install symdex or uv tool install symdex

Install Methods

pip install symdex                     # lean core
pip install "symdex[local]"            # + local semantic search
uv tool install symdex                 # isolated CLI tool
uvx symdex --help                      # run without installing
npx skills add https://github.com/husnainpk/SymDex --skill symdex-code-search --yes --global

CLI Binary

symdex — Python CLI entrypoint (via pyproject.toml)

Directory Structure

symdex/
├── symdex/         # Python package
│   ├── cli/        # CLI commands
│   ├── indexer/    # Symbol + route extractor (tree-sitter per language)
│   ├── search/     # FTS + semantic search
│   ├── mcp/        # MCP server (21 tools)
│   └── registry/   # Multi-repo registry management
skills/
└── symdex-code-search/
    └── SKILL.md    # Agent skill file

State Files

  • ~/.symdex/registry.db + registry.json — global multi-repo registry (default)
  • ~/.symdex/<repo-id>/ — per-repo index database with symbols, embeddings, route index
  • ./.symdex/ — workspace-local state when SYMDEX_STATE_DIR=.symdex (optional)

Required Runtime

  • Python 3.x
  • Optional: sentence-transformers (local embeddings), VOYAGE_API_KEY / OpenAI-compat / Gemini API key (hosted embeddings)
  • Optional: SYMDEX_EMBED_RPM env var for request pacing

Target AI Tools

Any MCP client: Claude Code, Cursor, GitHub Copilot, Codex, Gemini CLI, Windsurf, Roo, Kilo Code, VS Code. Also usable via CLI in shell-based agents.

Transport

stdio (default MCP) or HTTP (symdex serve --http, port 8080).

03

Components

SymDex — Components

MCP Tools (21)

Tool Purpose
index_folder Index current worktree structurally
index_repo Register + index a repo explicitly
list_repos Show all registered repos
search_symbols Find function/class/method by name
semantic_search Find code by intent/behavior (requires embeddings)
search_text Literal text or regex matches
build_context_pack Token-budgeted evidence bundle for a feature question
get_symbol Read exact source for one symbol
get_file_outline File structure before reading
get_repo_outline Repo map / summary
get_file_tree Directory tree
get_callers Who calls a symbol
get_callees What a symbol calls
search_routes Find HTTP routes
get_index_status Check repo freshness
get_repo_stats Code metrics, language mix
roi Return token savings estimate for last operation
roi_summary Cumulative ROI for session
roi_agent_hint Agent-facing savings hint for responses
invalidate_cache Force re-index after changes
cleanup_stale Remove old indexes

CLI Subcommands

index, search, text, semantic, routes, pack, serve, repos, watch, stats

Plus shell aliases: index-folder, index-repo, list-repos (MCP-shaped names).

Skills (1)

  • symdex-code-search — SKILL.md that instructs agents to use SymDex before broad file browsing; includes tool selection table, core rules, and current version snapshot.

Hooks

None — SymDex has no lifecycle hooks.

Scripts

None — all operations via CLI or MCP.

Semantic Backends (4)

  • localsentence-transformers (no API key)
  • voyage — Voyage AI embedding API
  • openai-compat — Any OpenAI-compatible /embeddings endpoint
  • gemini — Google Gemini Embedding API
05

Prompts

SymDex — Prompts

Prompt File 1: skills/symdex-code-search/SKILL.md

Technique: Behavioral instruction skill with a comprehensive tool selection table and explicit "search first" mandate.

Key excerpt:

---
name: symdex-code-search
description: |
  This skill should be used when finding, tracing, or understanding code in a repository
  with SymDex available. Trigger it for requests like "where is this defined?", "who
  calls this?", "what route handles this path?", "show me the file outline", "search
  this codebase by intent", or any task that would otherwise rely on broad Read/Grep/Glob
  exploration.
---

## Core Rules

- Search first.
- Pass `repo` whenever you know it.
- Use `build_context_pack` for broad "how does this feature work?" questions before issuing many separate narrow searches.
- Prefer `get_symbol` or `get_file_outline` over full-file reads.
- Use call graph and route tools before manual tracing.
- If a search tool returns `roi`, `roi_summary`, or `roi_agent_hint`, mention the approximate token savings briefly in your response.
- Inspect `quality` before reasoning from a result.

## Tool Selection

| Need | Tool |
|------|------|
| Index the current worktree | `index_folder` |
| Find a function by name | `search_symbols` |
| Find code by intent | `semantic_search` |
| Find literal text | `search_text` |
| Build a query-shaped evidence bundle | `build_context_pack` |
| Read exact source for one symbol | `get_symbol` |
| Get a file outline | `get_file_outline` |
| Trace who calls a symbol | `get_callers` |
| Find HTTP routes | `search_routes` |

Technique used: Tool selection table as decision tree; "search first" rule as iron law; ROI reporting mandate creates feedback loop.

Technique: Embedded token savings claim in every CLI output to reinforce agent behavior.

symdex search "validate_email" --repo myproject
# Output: <results>
# ─── SymDex saved ~1,847 tokens vs reading files directly ───

Technique used: Positive reinforcement in tool output — agents see savings immediately after each use, encouraging continued use.

09

Uniqueness

SymDex — Uniqueness

Differs From Seeds

SymDex occupies a unique niche in this batch: it is a code structure index, not a conversation memory system. The closest seed is none directly — it is most similar to the code intelligence layer inside CogniLayer (code_index, code_context, code_impact), but SymDex is standalone with 21 tools dedicated to this function, while CogniLayer's code intelligence is one component of a larger memory system. Against basic-memory (knowledge graph) and ccmemory (decision graph): those store AI conversation history and semantic facts; SymDex stores current codebase structure (symbols, routes, call graphs) that doesn't require AI conversations at all. Against taskmaster-ai: taskmaster decomposes tasks from conversation; SymDex maps code structure from AST parsing.

Unique Features in This Batch

  1. ROI metadata in every response — per-call token savings (roi, roi_summary, roi_agent_hint) so agents can quantify and communicate savings. No other framework in this batch does this.
  2. 21 language surfaces — broadest language coverage in this batch (Python, Go, Kotlin, Dart, Swift, HTML, CSS, Shell, Vue/Svelte, Markdown, etc.).
  3. HTTP route extraction — extracts API routes from 9 framework patterns (Flask, Express, Spring, Laravel, Gin, ASP.NET, Rails, Phoenix, Actix). Unique to SymDex in this batch.
  4. Quality metadata — every result includes confidence, freshness, parser mode, generated-file hints so agents can judge reliability before reasoning.
  5. Multi-repo registry — one SymDex install indexes many repos with a global registry, enabling cross-repo symbol search.

Observable Failure Modes

  1. No hooks — code changes outside a watched session don't auto-update the index; agent must call invalidate_cache or index_folder manually.
  2. Semantic search is opt-in — without fastembed, all semantic search calls fail gracefully but return only FTS results.
  3. Worker registrationSYMDEX_EMBED_RPM rate limiting can slow indexing for large repos with hosted embedding providers.
  4. Index stalenessget_index_status must be called to detect stale indexes; agents that skip this step may reason from outdated symbols.
  5. No conversation awareness — SymDex knows nothing about what the agent has discussed; if an agent needs "what did we decide about this function", SymDex cannot help.
04

Workflow

SymDex — Workflow

Phases

Phase What Happens Artifact
Install pip install symdex CLI available
Index symdex index ./myproject --repo myproject ~/.symdex/<repo-id>/ populated
Configure MCP Add to agent's MCP config Agent sees 21 tools
Install Skill npx skills add ... --skill symdex-code-search --global Agent instructed to use SymDex first
Active Use Agent calls get_symbol, search_symbols, build_context_pack Exact code retrieved
Watch (optional) symdex watch Auto-refresh on file changes
Lazy Re-index symdex index --lazy Structural re-index, embeddings in background

Agent Skill Activation

The symdex-code-search skill activates when the agent is asked to find, trace, or understand code:

  1. Check index freshness: get_index_status(repo)
  2. If stale: index_folder(path=".")
  3. Search before reading: search_symbols("function_name")
  4. Use build_context_pack for broad questions
  5. Fall back to native Read only when SymDex unavailable

ROI Reporting

After every search, SymDex prints (CLI) or returns (MCP) token savings:

symdex search "validate_email" → "Saved ~1,847 tokens vs reading 3 files"

MCP: roi_agent_hint field included in response → agent includes "~1,847 tokens saved" in its reply.

Approval Gates

None — pure query tool, no code modification.

Spec Format

None — SymDex is an index, not a spec system.

06

Memory Context

SymDex — Memory & Context

Memory Model

sqlite — each indexed repo gets a local SQLite database with symbols, byte offsets, embeddings, routes, call graphs, and retrieval quality metadata.

Persistence Scope

project — each repo gets its own database keyed by a stable repo ID (from git branch + worktree path hash). Cross-repo queries require calling tools with different repo IDs.

What SymDex Stores

This is NOT conversation memory — SymDex stores code structure:

Stored Description
Symbols Functions, classes, methods with byte offsets
File outlines Structural summary of each file
Routes Extracted HTTP routes (9 frameworks)
Caller graph Who calls what
Callee graph What calls what
Embeddings Optional vector embeddings for semantic search
Quality metadata Confidence, freshness, parser mode, generated-file hints
ROI metadata Token savings estimates per operation

Context Compaction Handling

no — SymDex is a query tool, not a session memory layer. It doesn't track Claude Code sessions.

Cross-Session Persistence

yes — the SQLite index persists across sessions. A repo indexed on Monday is still available on Friday. But there is no session narrative or conversation capture.

Token Reduction Claims

SymDex is the only framework in this batch that provides per-operation token savings data rather than session-level claims:

  • roi field in every MCP response: exact token savings for that call
  • roi_summary field: cumulative session savings
  • roi_agent_hint: natural language savings for agent to include in response
  • CLI footer: one-line savings after each search command

Concrete examples from README/SKILL.md:

  • get_symbol vs reading whole file: saves ~1,847 tokens (example)
  • build_context_pack (6K budget): assembles targeted evidence instead of reading 10+ files

Search Mechanism

hybrid — full-text search (base) + vector semantic search (optional, requires embedding backend). Quality metadata distinguishes parser-backed results (high confidence) from fallback text results (lower confidence).

State Files

  • ~/.symdex/registry.db + registry.json — global repo registry
  • ~/.symdex/<repo-id>/symbols.db — per-repo symbol index
  • ./.symdex/ — workspace-local state (opt-in)

Key Difference from Other Frameworks

SymDex indexes the current state of the codebase (code structure), not past AI conversations (history). It replaces "read the whole file to find a function" with "ask SymDex for that function." This is orthogonal to conversation memory systems.

07

Orchestration

SymDex — Orchestration

Multi-Agent

No — SymDex is a shared indexing service that multiple agents could query, but it ships no orchestration protocol.

Orchestration Pattern

none — single-agent query tool.

Isolation Mechanism

none — indexes are project-scoped by repo ID; no process isolation.

Execution Mode

interactive-loop — agent calls MCP tools on demand; symdex watch adds a structural background watcher (no embedding refresh unless --embed flag).

Multi-Model

No — model-agnostic MCP server.

Context Compaction Handling

no

Auto Validators

None — pure retrieval, no code modification.

Prompt Chaining

No — each MCP tool call is independent.

08

Ui Cli Surface

SymDex — UI & CLI Surface

CLI Binary

  • Name: symdex
  • Type: Own runtime (Python), not a thin wrapper
  • Subcommands: index, search, text, semantic, routes, pack, serve, repos, watch, stats
  • Also: shell aliases index-folder, index-repo, list-repos

Local UI

None — CLI only. No web dashboard or TUI.

HTTP Mode

symdex serve --http     # Streamable HTTP MCP on port 8080

Config for HTTP mode:

{
  "mcpServers": {
    "symdex": {
      "url": "http://localhost:8080/mcp"
    }
  }
}

Observability

  • ROI footer: every CLI search prints token savings estimate
  • symdex stats — repo metrics (file count, language mix, symbol count)
  • get_index_status MCP tool — freshness, last indexed timestamp
  • Upgrade notices: when a newer PyPI release exists, normal CLI commands print exact upgrade commands

IDE Integration

Any MCP client via stdio or HTTP. No dedicated VS Code extension. The symdex-code-search SKILL.md is installable in Claude Code, Cursor, etc. via npx skills add.

Transport

stdio (default) or Streamable HTTP (via symdex serve --http).

Related frameworks

same archetype · same primary tool · same memory type

MemPalace ★ 53k

Verbatim local-first AI memory with 96.6% R@5 retrieval on LongMemEval using zero API calls — structured into a palace hierarchy…

Beads (Yegge) ★ 24k

Dolt-powered distributed graph issue tracker where AI agents track tasks with hierarchical IDs and dependency edges, claim work…

deepagents (LangChain) ★ 23k

Opinionated Python agent harness on top of LangGraph with sub-agents, filesystem, memory, and context compaction bundled in

agentmemory ★ 18k

Persistent, searchable memory for AI coding agents that captures every tool interaction, compresses it via LLM, and injects…

Open Multi-Agent ★ 6.3k

Give a natural-language goal to a coordinator agent and get a dynamically decomposed, parallelized task DAG executed by…

Basic Memory ★ 3.1k

Gives AI agents a persistent, human-readable knowledge graph of project decisions, observations, and relations stored as plain…