Skip to content
/

codesight

codesight · Houseofmvps/codesight · ★ 1.1k · last commit 2026-05-18

Primitive shape 13 total
MCP tools 13
00

Summary

codesight — Summary

codesight is a universal AI context generator that compiles a codebase into a compact, token-efficient summary file (CODESIGHT.md) using AST-precision analysis and pattern detection across 30+ frameworks and 14 languages. Its primary value proposition is reducing per-conversation token spend by 7–130x by replacing raw file exploration with pre-compiled, structured maps of routes, schemas, components, middleware, and dependency graphs. The tool operates entirely locally with zero dependencies and zero API calls, running in ~200ms via a single npx codesight invocation. Beyond the base context map, codesight v1.6.2+ adds a wiki mode that generates a persistent .codesight/wiki/ knowledge base organized by domain (auth, database, payments, UI), allowing AI sessions to start with a 200-token index rather than the full 5K-token context map. An MCP server mode exposes 13 tools directly to Claude Code, Cursor, and any MCP-compatible client, and a --init flag generates platform-specific config files (CLAUDE.md, .cursorrules, codex.md, AGENTS.md) automatically. Unlike seed frameworks that focus on agent behavior rules or orchestration, codesight is a static analysis tool: it occupies the "input preparation" layer before any AI session begins, making it most similar to agent-os in its "write files that agents read" philosophy but differentiated by AST-precision extraction rather than manual template content.

01

Overview

codesight — Overview

Origin

Created by Kailesk Khumar (founder of HouseofMVPs and Kailxlabs), codesight emerged from a practical observation: AI assistants waste thousands of tokens every conversation figuring out project structure. The description on the npm page is direct: "Universal AI context generator. Saves thousands of tokens per conversation in Claude Code, Cursor, Copilot, Codex, and more."

Philosophy

The core philosophy is context engineering as static analysis. Rather than instructing agents how to behave, codesight pre-processes the codebase into the minimum viable representation needed for AI comprehension. TypeScript projects get full AST precision; other languages use battle-tested regex detection. The wiki mode extends this with Karpathy's LLM wiki pattern — "compiled from AST, not an LLM. Zero API calls. 200ms."

Stated Goals

From the README:

"Your AI assistant wastes thousands of tokens every conversation just figuring out your project. codesight fixes that in one command."

"Instead of loading the full 5K token context map every conversation, your AI reads one targeted article."

"The key difference from general-purpose wiki tools: codesight already knows your routes, schema, blast radius, and middleware from AST — no LLM needed to extract code structure."

Scope

The tool is explicitly a pre-session context packager, not a workflow framework. It has no opinions about how the agent should write code, plan features, or create PRs. Its boundary ends at handing off a .codesight/CODESIGHT.md (or wiki article) to the agent at session start.

Token Savings Claims

Measured on real OSS projects (v1.6.4):

  • SaaS A: 46,020 tokens → 550 tokens with wiki (83.7x reduction)
  • SaaS B: 26,130 tokens → 440 tokens (59.4x)
  • Average combined reduction: 91x
  • ultraship — 39 expert skills for Claude Code
  • claude-rank — SEO/GEO/AEO plugin for Claude Code
02

Architecture

codesight — Architecture

Distribution

  • Type: npm package (npx / global install)
  • Binary: codesight (mapped from dist/index.js in package.json bin)
  • Version analyzed: 1.14.0
  • Runtime required: Node.js >= 18
  • Dependencies: Zero runtime dependencies (0 deps in package.json)
  • Language: TypeScript (compiled to dist/)

Install Methods

# One-shot, no install
npx codesight

# Global install
npm install -g codesight

# MCP server mode
npx codesight --mcp

Directory Tree

codesight/
├── src/
│   ├── index.ts              # CLI entry point
│   ├── core.ts               # Main scan orchestration
│   ├── scanner.ts            # File collection + .codesightignore
│   ├── config.ts             # Config loading
│   ├── formatter.ts          # CODESIGHT.md output
│   ├── mcp-server.ts         # MCP JSON-RPC server (13 tools)
│   ├── telemetry.ts          # Anonymous usage metrics
│   ├── types.ts              # ScanResult types
│   ├── ast/                  # TypeScript AST analysis
│   ├── detectors/            # Language/framework detectors
│   │   ├── routes.ts
│   │   ├── schema.ts
│   │   ├── components.ts
│   │   ├── libs.ts
│   │   ├── config.ts
│   │   ├── middleware.ts
│   │   ├── graph.ts          # Dependency graph
│   │   ├── contracts.ts      # Route contract enrichment
│   │   ├── tokens.ts         # Token count estimation
│   │   ├── blast-radius.ts   # Import graph analysis
│   │   ├── graphql.ts
│   │   └── events.ts
│   ├── generators/
│   │   ├── wiki.ts           # Wiki knowledge base generation
│   │   ├── ai-config.ts      # CLAUDE.md / .cursorrules / AGENTS.md
│   │   └── html-report.ts    # Interactive HTML report
│   ├── plugins/              # Plugin system
│   └── monorepo/             # Monorepo support
├── .codesight/               # Output artifacts
│   ├── CODESIGHT.md          # Primary context map
│   ├── wiki/                 # Wiki knowledge base (--wiki)
│   │   ├── index.md
│   │   ├── overview.md
│   │   ├── auth.md
│   │   ├── database.md
│   │   └── ...
│   ├── config.md
│   ├── routes.md
│   └── ...
├── tests/
├── eval/                     # Benchmark data
└── package.json

Target AI Tools

  • Claude Code (primary)
  • Cursor
  • GitHub Copilot
  • OpenAI Codex
  • Windsurf
  • Cline
  • Aider
  • Any MCP-compatible client

Language/Framework Coverage

  • 14 programming languages (TypeScript at AST level; others at regex level)
  • 30+ framework detectors
  • 13 ORM parsers

Output Artifacts

File Format Purpose
.codesight/CODESIGHT.md Markdown Full context map
.codesight/wiki/index.md Markdown Domain article catalog
.codesight/wiki/*.md Markdown Per-domain articles
.codesight/KNOWLEDGE.md Markdown Decision/notes/ADR map
CLAUDE.md Markdown Claude-specific config
.cursorrules Plain text Cursor-specific config
AGENTS.md Markdown Multi-tool agent config
03

Components

codesight — Components

CLI Flags (Subcommands)

Flag Purpose
npx codesight (default) Scan project, generate CODESIGHT.md
--wiki Generate wiki knowledge base (.codesight/wiki/)
--init Generate CLAUDE.md, .cursorrules, codex.md, AGENTS.md
--open Open interactive HTML report in browser
--mcp Start as MCP server (13 tools) for Claude Code / Cursor
--blast <file> Show blast radius for a specific file
--profile <tool> Generate optimized config for a specific AI tool
--benchmark Show detailed token savings breakdown
--mode knowledge Map .md notes/ADRs to KNOWLEDGE.md
--mode knowledge <dir> Map a specific vault (Obsidian, etc.)
--watch Watch mode — regenerate on file changes

MCP Server Tools (13 total)

Tool Purpose
codesight_scan Full codebase scan, returns structured ScanResult
codesight_get_routes Extract HTTP/GraphQL/gRPC/WebSocket routes
codesight_get_schemas Extract database schemas and ORM models
codesight_get_components Extract UI components with props
codesight_get_libs Extract library/dependency usage
codesight_get_middleware Extract middleware/interceptors
codesight_get_graph Dependency graph edges
codesight_get_blast_radius Impact analysis for a file
codesight_get_tokens Token count for context map sections
codesight_get_wiki_index Get wiki article catalog (~200 tokens)
codesight_get_wiki_article Read one wiki article by name
codesight_lint_wiki Check for orphan articles, stale content
codesight_get_knowledge Get KNOWLEDGE.md content

Detectors

Detector Purpose
routes.ts HTTP routes (Express, Fastify, Next.js, etc.)
schema.ts DB schemas (Prisma, Drizzle, SQLAlchemy, etc.)
components.ts UI components (React, Vue, Svelte)
libs.ts Library detection
config.ts Framework config files
middleware.ts Auth/rate-limit/logging middleware
graph.ts Import dependency graph
contracts.ts Route input/output types from AST
blast-radius.ts Reverse dependency traversal
graphql.ts GraphQL/gRPC/WebSocket routes
events.ts Event emitters/handlers

Generators

Generator Output
wiki.ts .codesight/wiki/*.md domain articles
ai-config.ts CLAUDE.md, .cursorrules, AGENTS.md, codex.md
html-report.ts Interactive browser report

Config

.codesight.json (optional) — project-level overrides for ignore patterns, max depth, custom detectors.

05

Prompts

codesight — Prompts

codesight does not ship AI prompt files — it ships generated context files (CODESIGHT.md, wiki articles, KNOWLEDGE.md) that are read by AI assistants as passive context. The "prompting technique" is therefore structural: pre-compilation of codebase metadata into a token-minimal format.

Excerpt 1 — Sample CODESIGHT.md Output Structure (from eval directory)

The generated CODESIGHT.md follows a deterministic schema:

# Project: my-saas-app

**Stack:** Next.js 14 · TypeScript · PostgreSQL (Prisma) · Tailwind

## Routes (23)

### Auth (4)
| Method | Path | Auth | Input | Output |
|--------|------|------|-------|--------|
| POST | /api/auth/login | none | { email, password } | { token, user } |
| POST | /api/auth/register | none | { email, password, name } | { user } |
| GET  | /api/auth/me | bearer | — | { user } |
| POST | /api/auth/logout | bearer | — | { ok } |

## Schemas (3)

### User
| Field | Type | Nullable | Relations |
|-------|------|----------|-----------|
| id | String (uuid) | false | posts, sessions |
| email | String | false | — |
| createdAt | DateTime | false | — |

Technique: Tabular compression. Routes and schemas compressed from ~150 tokens each (raw file read) to ~15 tokens (table row). Achieves 7–10x reduction at the type-information level.

Excerpt 2 — Wiki Article Format (auth.md)

# Auth — my-saas-app

## Routes
- POST /api/auth/login — no auth required — returns token + user
- POST /api/auth/register — no auth required
- GET /api/auth/me — requires bearer token

## Middleware
- `withAuth` (src/middleware/auth.ts) — validates JWT, attaches req.user

## Session
- JWT stored in httpOnly cookie
- 7-day expiry, refresh via /api/auth/refresh

## Key Files
- src/middleware/auth.ts — auth enforcement
- src/lib/jwt.ts — token creation/validation
- prisma/schema.prisma (User model)

Technique: Narrative + selective inclusion. Domain article includes only auth-relevant routes, middleware, and files — not the full codebase map. At ~300 tokens vs. ~5K for CODESIGHT.md, this is targeted retrieval.

Excerpt 3 — AI Config Init Output (CLAUDE.md generated content)

# Project Context

**Stack**: Next.js 14, TypeScript, PostgreSQL (Prisma), Tailwind CSS
**Test runner**: Vitest
**Package manager**: pnpm

## Key paths
- Routes: src/app/api/, src/pages/api/
- Components: src/components/
- DB schema: prisma/schema.prisma
- Auth: src/middleware/auth.ts, src/lib/jwt.ts

## Context Files
- `.codesight/CODESIGHT.md` — full codebase map (3,936 tokens)
- `.codesight/wiki/index.md` — wiki catalog (200 tokens)

## Usage
Start each session by reading `.codesight/wiki/index.md`, then load the
relevant domain article. Only read CODESIGHT.md when you need full context.

Technique: Structured onboarding. Tells the agent where context files live and how to use them efficiently — minimizing "figuring out" cost at session start.

09

Uniqueness

codesight — Uniqueness

differs_from_seeds

codesight is most similar to agent-os in that both produce files that agents read passively, but they differ fundamentally in what those files contain. agent-os ships curated markdown templates (standards/, profiles/) that encode developer philosophy and workflow conventions; codesight generates machine-extracted structural maps (routes, schemas, dependency graphs) from AST analysis of live source code. codesight most closely resembles a pre-session counterpart to taskmaster-ai's analyze-complexity — both reduce the token cost of "figuring out the codebase" — but codesight is entirely static (zero LLM calls) while taskmaster-ai uses Claude to analyze its own tasks. Unlike superpowers, BMAD-METHOD, or spec-driver, codesight ships no behavioral skills, workflow instructions, or enforcement mechanisms; it has no opinion about how the agent should code, only what it should know about the codebase before coding.

Positioning

  • Primary differentiator: AST-precision extraction. TypeScript projects get full type information, route contracts, and import graphs from actual AST traversal — not grep patterns or regex.
  • Secondary differentiator: Three-tier context budget management (full map → wiki → targeted article), making codesight useful even in severely token-constrained sessions.
  • Target user: Developers who have already chosen their AI tool and framework, want to reduce per-session token spend without changing their workflow.

Observable Failure Modes

  1. Non-TypeScript projects get degraded accuracy: Only TypeScript benefits from AST precision; Python, Go, Ruby, etc. use regex detection which may miss routes or schemas in unconventional structures.
  2. Wiki staleness: If --wiki or --watch is not run after code changes, the wiki articles become stale. The log.md tracks when wiki was last regenerated, but there is no auto-invalidation.
  3. MCP tool cold start: Starting as an MCP server re-scans on first call; large monorepos may have noticeable first-call latency.
  4. Framework detector miss: 30+ detectors cover popular frameworks, but bespoke or niche frameworks will produce sparse context maps.
  5. Zero behavioral guardrails: codesight generates config files but has no enforcement mechanism — if the agent ignores CLAUDE.md, there is no fallback.
04

Workflow

codesight — Workflow

Execution Pattern

codesight is a one-shot pre-session tool — not a continuous agent loop. It runs before an AI session begins, producing artifacts the agent reads passively.

Primary Workflow

Phase 1: Project Scan (one-time or on-change)

npx codesight
  • Collects files respecting .codesightignore
  • Detects project language/framework
  • Runs all detectors in parallel (routes, schemas, components, libs, config, middleware, graph, GraphQL, gRPC, WebSocket, events)
  • Enriches route contracts from AST
  • Computes token statistics
  • Writes .codesight/CODESIGHT.md

Artifact produced: .codesight/CODESIGHT.md (~3K–5K tokens)

Phase 2: Wiki Generation (optional, persistent)

npx codesight --wiki
  • Organizes scan results into domain-specific articles
  • Writes .codesight/wiki/index.md + per-domain articles
  • Append-only log.md tracks every wiki operation

Artifact produced: .codesight/wiki/*.md (~200-token index + ~300-token articles)

Phase 3: Config Generation (one-time setup)

npx codesight --init
  • Generates CLAUDE.md, .cursorrules, codex.md, AGENTS.md
  • Tailored output per AI tool

Artifact produced: Multiple AI tool config files

Phase 4: AI Session

Agent reads .codesight/CODESIGHT.md or wiki articles at session start. No ongoing codesight involvement.

Phase-to-Artifact Map

Phase Artifact
Scan .codesight/CODESIGHT.md
Wiki .codesight/wiki/index.md, .codesight/wiki/*.md
Knowledge .codesight/KNOWLEDGE.md
Init CLAUDE.md, .cursorrules, AGENTS.md, codex.md
HTML Browser report (ephemeral)

Approval Gates

None — fully automated, no user confirmation steps.

CI Integration

npx codesight --mode knowledge  # in CI alongside codesight scan

Both artifacts stay current on every push.

Incremental / Watch

npx codesight --watch  # regenerate on file changes (debounced)
npx codesight --hook   # regenerate on every git commit
06

Memory Context

codesight — Memory & Context

State Storage

codesight uses file-based project-scoped storage in the .codesight/ directory.

Persistent Artifacts

File Format Persistence Updated
.codesight/CODESIGHT.md Markdown Project-level On every npx codesight run
.codesight/wiki/*.md Markdown Project-level On --wiki run
.codesight/wiki/log.md Append-only log Project-level On every wiki operation
.codesight/KNOWLEDGE.md Markdown Project-level On --mode knowledge run
.codesight/config.md Markdown Project-level On scan

No Session State

codesight maintains no session-level state — it does not track what an AI assistant did or read. Each scan produces a fresh output file.

Context Compaction Handling

The wiki mode was explicitly designed for context compaction scenarios:

  • wiki/index.md (~200 tokens) is the session-start entry point
  • Individual articles (~300 tokens each) are loaded on demand
  • The full CODESIGHT.md (~3K–5K tokens) is a fallback when targeted lookup fails

This three-tier approach means that even when a Claude Code session compacts its context, the agent can re-establish orientation by re-reading wiki/index.md rather than the full context map.

Cross-Session Handoff

Wiki files are committed to git, enabling cross-machine and cross-session context sharing. Any new session (on any machine) starts with full codebase knowledge from the wiki files without re-scanning.

Cache Behavior

  • Results are cached in memory during a single scan run
  • No cross-run disk cache — every npx codesight re-scans from scratch
  • --watch mode uses debounced file-change detection to minimize re-scans

.codesightignore

Users can create .codesightignore (gitignore-style) to exclude paths from scanning.

07

Orchestration

codesight — Orchestration

Multi-Agent Pattern

codesight is a single-shot tool with no orchestration layer. It does not spawn agents, coordinate workers, or sequence AI tasks. It runs once, writes files, and exits.

Orchestration Pattern

none — codesight produces context artifacts consumed by external agents; it does not orchestrate agents itself.

Execution Mode

one-shot — invoked per scan, produces output, terminates. Exception: --watch mode runs as a continuous file watcher (not a continuous agent loop).

Internal Parallelism

The scan itself uses Promise.all across all detectors (routes, schemas, components, libs, config, middleware, graph, graphql, grpc, websocket, events) — parallel static analysis, not multi-agent orchestration.

Isolation Mechanism

none — codesight reads the filesystem and writes to .codesight/. No sandboxing.

Multi-Model Usage

None — codesight makes zero LLM API calls. All analysis is static (AST + regex).

Subagent Definition Format

None.

Consensus Mechanism

None.

Prompt Chaining

No — codesight output is passive context, not a chained prompt input.

MCP Mode as Integration Point

When running as an MCP server (--mcp), codesight exposes 13 tools that AI agents can call during a session. This is the closest to "orchestration" codesight gets — a tool server that agents query rather than a standalone pipeline.

Cross-Tool Portability

High — codesight generates standard markdown files (CODESIGHT.md, wiki/*.md, KNOWLEDGE.md) readable by any AI tool that accepts file context. The MCP server further extends compatibility to any MCP-protocol client.

08

Ui Cli Surface

codesight — UI & CLI Surface

CLI Binary

  • Binary name: codesight
  • Is thin wrapper: No — it is the runtime (TypeScript compiled to Node.js)
  • Subcommands: Flag-based (not subcommand-based). Key flags: --wiki, --init, --open, --mcp, --blast, --profile, --benchmark, --mode knowledge, --watch
npx codesight                          # Default: scan + generate CODESIGHT.md
npx codesight --wiki                   # Wiki mode
npx codesight --init                   # Generate AI tool configs
npx codesight --open                   # Open HTML report in browser
npx codesight --mcp                    # Start MCP server
npx codesight --blast src/lib/db.ts    # Blast radius analysis
npx codesight --profile claude-code    # Profile-specific config
npx codesight --benchmark              # Token savings breakdown
npx codesight --mode knowledge         # Knowledge map from .md files

Local UI

  • Exists: Yes — interactive HTML report
  • Type: Browser-rendered HTML (not a persistent server)
  • Access: npx codesight --open opens the HTML in the default browser
  • Features: Codebase visualization, token savings stats, framework detection results

MCP Server

  • Exists: Yes
  • Type: stdio MCP server
  • Tool count: 13 tools
  • Start command: npx codesight --mcp

IDE Integration

None native — the --init flag generates AI tool config files (CLAUDE.md, .cursorrules, AGENTS.md, codex.md) that configure the tools, but codesight itself has no IDE plugin.

Observability

  • --benchmark flag prints detailed token savings breakdown
  • Token stats included in every scan output (output tokens, estimated exploration tokens saved)
  • No audit log of agent actions (codesight doesn't observe the agent)

GitHub Actions

Not shipped as an action — designed for manual or CI invocation:

npx codesight
npx codesight --mode knowledge

Related frameworks

same archetype · same primary tool · same memory type

Taskmaster AI ★ 27k

Converts a PRD into a dependency-ordered JSON task graph that AI coding agents execute one task at a time, eliminating context…

ccmemory ★ 1

Accumulates decisions, corrections, and failed approaches from Claude Code sessions into a queryable Neo4j graph so each new…

Pimzino spec-workflow-mcp ★ 4.2k

MCP server providing spec-driven development workflow with dashboard-backed approval gates, implementation logging, and VSCode…

MCP Shrimp Task Manager ★ 2.1k

Convert natural language requests into structured AI development tasks with chain-of-thought enforcement, reflection gates, and…

Bernstein ★ 460

Govern parallel CLI coding agents with a deterministic Python scheduler, HMAC-chained audit trail, and compliance-ready signed…

LeanSpec ★ 252

Provides a unified spec CLI and MCP server over any existing spec backend (markdown, GitHub Issues, ADO), making spec-driven…