Skip to content
/

claude-context-local

claude-context-local · FarhanAliRaza/claude-context-local · ★ 231 · last commit 2025-11-13

Primitive shape 8 total
MCP tools 8
00

Summary

claude-context-local — Summary

A local semantic code search MCP server that indexes codebases with Google's EmbeddingGemma model (1.2–1.3 GB, runs 100% locally) and serves search results to Claude Code via 8 MCP tools. The core promise: semantic search without API costs or cloud data transmission — "your code never leaves your machine."

The system uses AST-based chunking for Python (rich metadata) and tree-sitter for 8 other languages (JS/TS/JSX/TSX/Svelte/Go/Java/Rust/C/C++/C#). Embeddings are stored in a FAISS index (CPU by default, GPU acceleration via CUDA or Apple Silicon MPS). A Merkle DAG tracks incremental changes so re-indexing only processes modified files.

Built by FarhanAliRaza (GitHub), described as "Beta Release" with 231 stars and installation via a one-liner curl command. Part of a workflow where the user tells Claude Code to "index this codebase" and then can ask questions like "find authentication code" to get semantically relevant results without burning tokens on full file context.

Compared to seeds: closest to ccmemory (Archetype 3 — MCP-anchored toolserver for memory/context) but ccmemory stores knowledge graph memories while claude-context-local stores code embeddings. Both use a local MCP server for persistence, but the underlying store is completely different (FAISS vector index vs. Neo4j graph).

01

Overview

claude-context-local — Overview

Origin

Created by FarhanAliRaza (GitHub), published late 2025. Beta release. The repo description: "Code search MCP for Claude Code. Make entire codebase the context for any coding agent. Embeddings are created and stored locally. No API cost."

Philosophy

"Claude's code context is powerful, but sending your code to the cloud costs tokens and raises privacy concerns. This project keeps semantic code search entirely on your machine."

Three core claims:

  1. Privacy: Code never leaves the machine
  2. Cost: No API keys, no token costs for embedding
  3. Speed: Fewer tokens needed because Claude gets precise semantic matches instead of loading entire files

Supported Languages (15 file extensions)

Python, JavaScript, TypeScript, JSX, TSX, Svelte, Go, Java, Rust, C, C++, C#

Chunking Strategy

Two-tier:

  • Python: AST-based chunking with rich metadata (function names, class hierarchies, docstrings)
  • Other 8 languages: tree-sitter for syntax-aware chunking

Embedding Model

Google's EmbeddingGemma (~1.2–1.3 GB download). Runs locally via SentenceTransformer. Device selection: CUDA → MPS (Apple Silicon) → CPU.

Key Design Decision

The choice of EmbeddingGemma over a simpler embedding model means the first run requires a 1.2 GB model download. Subsequent runs use the cached model. This is a significant installation friction point for a "local-first" tool.

02

Architecture

claude-context-local — Architecture

Distribution

mcp-server — Python package installed via curl one-liner.

Install

curl -fsSL https://raw.githubusercontent.com/FarhanAliRaza/claude-context-local/main/scripts/install.sh | bash

# Or wget:
wget -qO- .../install.sh | bash

Register MCP Server

claude mcp add code-search --scope user -- \
  uv run --directory ~/.local/share/claude-context-local python mcp_server/server.py

What the Installer Does

  1. Installs uv (Python package manager) if missing
  2. Clones/updates repo to ~/.local/share/claude-context-local
  3. Installs Python deps with uv sync
  4. Downloads EmbeddingGemma model (~1.2–1.3 GB) if not cached
  5. Optionally installs faiss-gpu if NVIDIA GPU detected

Source Directory Tree

claude-context-local/
├── chunking/
│   ├── multi_language_chunker.py   # Orchestrator (Python AST + tree-sitter)
│   ├── python_ast_chunker.py       # Python-specific (rich metadata)
│   └── tree_sitter.py              # JS/TS/Go/Java/Rust/C/C++/C#
├── embeddings/
│   └── embedder.py                 # EmbeddingGemma; device=auto; offline cache
├── search/
│   ├── indexer.py                  # FAISS index (CPU default; GPU when available)
│   ├── searcher.py                 # Ranking + filters
│   └── incremental_indexer.py      # Merkle-driven incremental indexing
├── merkle/
│   ├── merkle_dag.py               # Content-hash DAG of workspace
│   ├── change_detector.py          # Diffs snapshots to find changed files
│   └── snapshot_manager.py         # Snapshot persistence + stats
├── mcp_server/
│   ├── server.py                   # MCP entry point (stdio/HTTP)
│   ├── code_search_server.py       # Server logic
│   ├── code_search_mcp.py          # FastMCP tool registration
│   └── strings.yaml                # Tool descriptions (externalised)
└── scripts/
    ├── install.sh                  # One-liner installer
    ├── download_model_standalone.py
    └── index_codebase.py

Data Flow

User codebase (15 extensions)
  → Chunking (AST/tree-sitter)
  → EmbeddingGemma (local) → float vectors
  → FAISS index (~/.claude_code_search/)
  
Query ("find authentication code")
  → EmbeddingGemma → query vector
  → FAISS similarity search
  → Ranked code chunks
  → MCP response to Claude Code

Storage

Index stored at ~/.claude_code_search/ — preserved across updates.

Transport

stdio (default) or HTTP/SSE (via --transport flag).

03

Components

claude-context-local — Components

MCP Tools (8, from strings.yaml)

Tool Description
search_code Search code by natural language query using semantic similarity. Returns ranked results with file paths, line numbers, similarity scores, and code snippets.
index_directory Index a codebase using semantic embeddings.
find_similar_code Find code chunks functionally similar to a reference chunk (by chunk_id).
get_index_status Get current index statistics and model status.
list_projects List all indexed projects with their metadata.
switch_project Switch to a different indexed project for searching.
index_test_project Index a test project (development utility).
clear_index Clear the search index and metadata for the current project.

Key Tool Signatures

# search_code
search_code("authentication")  # minimal
search_code("authentication", k=10, file_pattern="*.py")  # full

# index_directory
index_directory("/path/to/project")  # minimal
index_directory("/path/to/project", project_name="project", incremental=False)  # full

# find_similar_code
find_similar_code(chunk_id, k=5)  # chunk_id from search_code results

Skills (0)

None.

Hooks (0)

None.

Commands (0)

No slash commands.

Scripts (3)

Name Purpose
scripts/install.sh One-liner installer
scripts/download_model_standalone.py Pre-fetch EmbeddingGemma model
scripts/index_codebase.py Standalone indexing utility

MCP Resources (1)

The search://stats resource provides detailed index statistics.

MCP Prompts (1)

The search_help prompt provides usage guidance for code search tools.

05

Prompts

claude-context-local — Prompts

Tool Descriptions (from strings.yaml — verbatim)

The tool descriptions are loaded from an external YAML file (mcp_server/strings.yaml), making them easy to maintain separately from code.

search_code (verbatim):

search_code: |
  Search code by natural language query using semantic similarity. Returns ranked
  results with file paths, line numbers, similarity scores, and code snippets.
  Use for understanding functionality, finding patterns, or discovering related code.
  Minimal usage: search_code("authentication")
  Full usage: search_code("authentication", k=10, file_pattern="*.py")

index_directory (verbatim):

index_directory: |
  Index a codebase using semantic embeddings.
  Minimal usage: index_directory("/path/to/project")
  Full usage: index_directory("/path/to/project", project_name="project",
    file_pattern="*.py", incremental=False)

find_similar_code (verbatim):

find_similar_code: |
  Find code chunks functionally similar to a reference chunk. Useful for finding
  alternative implementations, code duplication, or refactoring related code.
  Usage: find_similar_code(chunk_id from search_code, k=5)

MCP Prompt (search_help)

The server exposes a search_help prompt:

help: |
  # Find similar code

  1. Index a codebase: `index_directory("/path/to/project")`
  2. Search: `search_code("description of code you want")`
  3. Find similar: `find_similar_code(chunk_id from search_code)`

  See tool descriptions for details and available options.

Prompting Technique Classification

  • Externalized descriptions: Tool descriptions in strings.yaml (not hardcoded) — clean separation
  • Minimal + full usage examples: Each description shows both minimal and full form
  • No skill layer: No CLAUDE.md with agent workflow instructions
  • Conversational activation: User says "index this codebase" in natural language; Claude infers to call index_directory
09

Uniqueness

claude-context-local — Uniqueness

differs_from_seeds

Closest seed is ccmemory (Archetype 3 — MCP-anchored toolserver). Both use a local MCP server for persistence, but ccmemory stores knowledge graph memories (Neo4j + vector) about conversation and code decisions, while claude-context-local stores code embeddings for semantic search. The key difference: ccmemory's memory is about what the agent decided; claude-context-local's memory is about what the code contains. Also closest to taskmaster-ai in the sense of being a local-first MCP server with file-based persistence, but taskmaster-ai manages tasks, not code embeddings.

Positioning

The only framework in this batch (and likely the corpus) that uses a local neural embedding model (EmbeddingGemma) for code search. All other memory/context frameworks use either file-based (keyword) storage, SQLite, or cloud-based vector services. This framework is the only one that brings embedding-based semantic search 100% locally.

Distinctive Opinion

"Claude's code context is powerful, but sending your code to the cloud costs tokens and raises privacy concerns."

The framework makes a strong privacy argument for local embeddings over cloud APIs. This resonates particularly for enterprise or security-conscious users.

Observable Failure Modes

  1. 1.2 GB model download: First-run friction that blocks "quick start" users
  2. FAISS index staleness: If incremental=False is needed but the user only calls with incremental=True, stale results
  3. No real-time updates: Index must be explicitly re-run; auto-indexing on file save not implemented
  4. GPU detection ambiguity: faiss-gpu install is attempted interactively — breaks in CI/headless environments
  5. Beta quality: Self-described as "Core functionality working / benchmarks coming soon / please report issues"
  6. No chunk deduplication: Duplicate code (copy-pasted functions) creates redundant search results

What Makes It Novel vs. the 11 Seeds

  • Only framework using a local neural embedding model (EmbeddingGemma) for code search
  • Only MCP server with AST-based Python chunking and tree-sitter for multi-language support
  • Only framework with Merkle DAG-based incremental indexing
  • Only framework explicitly positioning privacy (no cloud) as a primary value proposition
  • Only framework supporting GPU acceleration (CUDA + MPS) for embedding generation
04

Workflow

claude-context-local — Workflow

Initial Setup

# 1. Install
curl -fsSL .../install.sh | bash
# (downloads EmbeddingGemma model ~1.2GB)

# 2. Register MCP server
claude mcp add code-search --scope user -- \
  uv run --directory ~/.local/share/claude-context-local python mcp_server/server.py

# 3. Open Claude Code — server starts automatically

First-Use Indexing (via Claude Code chat)

User: "index this codebase"
Claude: → calls index_directory("/current/project/path")
         → chunks files (AST/tree-sitter)
         → generates embeddings (EmbeddingGemma)
         → builds FAISS index
         → stores in ~/.claude_code_search/

Ongoing Usage

User: "find authentication code"
Claude: → calls search_code("authentication", k=10)
         → query embedded, similarity search
         → returns ranked chunks with file paths + line numbers
         → Claude loads only the relevant chunks (not full files)

User: "find code similar to this function [chunk_id]"
Claude: → calls find_similar_code(chunk_id, k=5)
         → returns functionally similar code across codebase

Incremental Re-indexing

# After code changes:
Claude: → calls index_directory("/path", incremental=True)
         → Merkle DAG computes diff
         → only changed files re-embedded
         → FAISS index updated incrementally

Phase/Artifact Table

Phase Artifact
install ~/.local/share/claude-context-local/, EmbeddingGemma model
index ~/.claude_code_search/<project-hash>/ FAISS index
search ranked code chunks (JSON in MCP response)
incremental-update updated FAISS index (delta only)

Approval Gates

None. Fully automated once installed.

06

Memory Context

claude-context-local — Memory & Context

Memory Architecture

This is the only framework in this batch that implements a true semantic memory store for code. The memory system has four layers:

Layer 1: Chunking

Source files are split into semantic chunks via AST (Python) or tree-sitter (other languages). Each chunk carries metadata: file path, line range, language, semantic tags.

Layer 2: Embedding

Each chunk is embedded using EmbeddingGemma. Vectors are dense float arrays (embedding dimension determined by the model).

Layer 3: FAISS Index

Vectors stored in a FAISS index for efficient similarity search. Default: CPU FAISS. Optional: GPU FAISS (CUDA) or Apple Silicon MPS acceleration.

Layer 4: Merkle DAG

Content-hash DAG of the workspace enables incremental indexing — only changed files need re-embedding.

State Files

  • ~/.claude_code_search/<project-hash>/ — FAISS index + metadata
  • ~/.local/share/claude-context-local/ — model + code (preserved across updates)

Cross-Session Persistence

Yes. The FAISS index persists on disk. Claude Code can search previously-indexed code without re-indexing on each session.

Context Window Optimization

The key context efficiency claim: instead of loading entire relevant files, Claude receives precisely the semantic matches (chunks with file path + line numbers). This significantly reduces context usage for large codebases.

Example: "find authentication logic" → 5 chunks × ~50 lines each = ~250 lines loaded vs. potentially thousands of lines in naive file-loading.

Memory Compaction

None needed — the FAISS index is the compact form. Full source code is not stored; only embeddings and metadata.

Search Mechanism

Vector similarity search (L2 or cosine distance via FAISS), ranked by similarity score. Optional file_pattern filter for restricting search scope.

07

Orchestration

claude-context-local — Orchestration

Multi-Agent Pattern

None. Single MCP server serving a single Claude Code session.

Execution Mode

event-driven — server runs as a background daemon, responding to MCP tool calls.

Transport Options

  • --transport stdio (default): standard MCP stdio transport
  • --transport sse: HTTP SSE transport (legacy)
  • --transport http: HTTP transport (port 8000 default)

Multi-Model

Not applicable. The server is model-agnostic but tightly coupled to Claude Code for the practical workflow.

Isolation

None. The MCP server reads from the local filesystem with no sandboxing.

Incremental Indexing Architecture

The Merkle DAG is the most architecturally interesting component:

1. snapshot_manager takes a content-hash snapshot of the workspace
2. change_detector diffs current state against previous snapshot
3. Only changed/new files are re-chunked and re-embedded
4. FAISS index is updated with new/modified vectors
5. Deleted files' vectors are removed from index

This is equivalent to git's content-addressing approach applied to embeddings.

GPU Acceleration

  • NVIDIA CUDA: both FAISS search and embedding generation accelerated
  • Apple Silicon MPS: embedding generation accelerated; FAISS uses CPU
  • CPU fallback: always works, slower for large codebases
08

Ui Cli Surface

claude-context-local — UI & CLI Surface

Dedicated CLI Binary

No. The MCP server is invoked via uv run ... python mcp_server/server.py. No packaged binary.

Local Web Dashboard

None.

Installation Script

scripts/install.sh — one-liner installer. Handles uv install, code clone/update, dependency installation, and model download. Designed to be idempotent (re-running updates the code while preserving the index).

Server CLI Arguments

python mcp_server/server.py [--transport stdio|sse|http] [--host localhost] [--port 8000]

Standalone Utilities

# Pre-fetch model without starting server
python scripts/download_model_standalone.py

# Index a codebase without Claude Code
python scripts/index_codebase.py /path/to/project

Observability

Python logging enabled with DEBUG level in the server. Logs to stderr (captured by MCP client or terminal).

IDE Integration

Claude Code only (registered via claude mcp add).

Cross-Tool Portability

Low — documented only for Claude Code. The MCP server is technically portable but no setup instructions for other clients.

Related frameworks

same archetype · same primary tool · same memory type

MemPalace ★ 53k

Verbatim local-first AI memory with 96.6% R@5 retrieval on LongMemEval using zero API calls — structured into a palace hierarchy…

Beads (Yegge) ★ 24k

Dolt-powered distributed graph issue tracker where AI agents track tasks with hierarchical IDs and dependency edges, claim work…

deepagents (LangChain) ★ 23k

Opinionated Python agent harness on top of LangGraph with sub-agents, filesystem, memory, and context compaction bundled in

agentmemory ★ 18k

Persistent, searchable memory for AI coding agents that captures every tool interaction, compresses it via LLM, and injects…

Open Multi-Agent ★ 6.3k

Give a natural-language goal to a coordinator agent and get a dynamically decomposed, parallelized task DAG executed by…

Basic Memory ★ 3.1k

Gives AI agents a persistent, human-readable knowledge graph of project decisions, observations, and relations stored as plain…