Skip to content
/

SwarmVault

swarmvault · swarmclawai/swarmvault · ★ 492 · last commit 2026-05-20

Primitive shape 37 total
Commands 35 Skills 1 MCP tools 1
00

Summary

SwarmVault — Summary

SwarmVault is a local-first LLM Wiki CLI and knowledge graph builder that implements Andrej Karpathy's three-layer pattern (raw sources → wiki → schema) as a production-grade tool. It ingests books, articles, notes, code, URLs, transcripts, and datasets into a durable markdown wiki plus a local SQLite + semantic graph, with MCP server, desktop app, git-based automation, and agent-facing context-pack and task-ledger workflows. The swarmvault CLI has 30+ subcommands covering init, ingest, compile, query, review, graph operations, chat sessions, context builds, task ledgers, and AI export. An OpenClaw-compatible skill file ships in skills/swarmvault/SKILL.md.

SwarmVault is most similar to ccmemory from the seeds (both are graph-based agent memory stores), but differs fundamentally: ccmemory uses Neo4j + vector embeddings for agent memory of code; SwarmVault is a local SQLite + markdown wiki for personal knowledge management across all content types (books, papers, URLs, code, audio, video), with a full CLI, desktop app, MCP server, and an approval queue for LLM-generated changes.

01

Overview

SwarmVault — Overview

Origin

Built by SwarmClaw AI (swarmvault.ai). Ships under MIT license. Inspired by Andrej Karpathy's LLM Wiki gist (https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f). The README explicitly positions against the original gist:

"If you liked Karpathy's LLM Wiki gist, SwarmVault is the production-grade version."

Philosophy

From the README:

"SwarmVault turns docs, code, transcripts, notes, and URLs into a durable markdown wiki plus a local graph you can inspect, query, and hand to agents."

"In the tradition of Vannevar Bush's Memex (1945) — a personal, curated knowledge store with associative trails between documents — SwarmVault treats the connections between sources as valuable as the sources themselves. The part Bush couldn't solve was who does the maintenance. The LLM handles that."

Three-Layer Architecture (Karpathy's Pattern)

  1. Raw sources (raw/) — immutable, curated source documents. Never modified by SwarmVault.
  2. The wiki (wiki/) — LLM-generated + human-authored markdown. Compounding artifact.
  3. The schema (swarmvault.schema.md) — conventions, grounding, what matters in this domain.

Addressing Community Concerns (from README)

  • "Won't hallucinations compound?" → Every edge tagged extracted, inferred, or ambiguous. Contradiction detection. Approval queues (compile --approve).
  • "Does it scale past 100 pages?" → Hybrid search (SQLite FTS + semantic embeddings). compile --max-tokens.
  • "Do I need API keys?" → No. Built-in heuristic provider is fully offline.

Release Cadence

SKILL.md version 3.15.0 — active development with frequent releases.

02

Architecture

SwarmVault — Architecture

Distribution

  • CLI: npm install -g @swarmvaultai/cli
  • Desktop app: Download from swarmvault.ai/download (macOS, Windows, Linux — bundles own runtime)
  • Obsidian plugin: packages/obsidian-plugin/

Directory Structure

packages/
  cli/              # @swarmvaultai/cli Node.js package
  engine/           # Core vault engine
  obsidian-plugin/  # Obsidian plugin
  viewer/           # Graph viewer package
skills/
  swarmvault/
    SKILL.md        # OpenClaw-compatible skill file
    README.md
    TROUBLESHOOTING.md
    examples/
    references/
    validation/
templates/
  llm-wiki-schema.md  # Standalone schema template (zero install)
docs/
scripts/
smoke/              # Smoke tests
validation/
worked/

Required Runtime

  • Node.js >= 24 (CLI)
  • Desktop app bundles own runtime (no Node.js required)

Install Complexity

One-liner npm install. Desktop app: download binary.

Vault Artifacts (on disk)

raw/               # Immutable source documents
wiki/              # Generated markdown + saved outputs
  outputs/         # Query results, chat sessions, context packs
  dashboards/      # Automated dashboards
  graph/           # Share cards, reports
state/
  graph.json       # Machine-readable knowledge graph
  retrieval/       # Local search index
swarmvault.config.json
swarmvault.schema.md

Output Directory Override

SWARMVAULT_OUT — override root for generated artifact directories.

MCP Server

swarmvault mcp — starts MCP server for agent consumption of vault tools.

03

Components

SwarmVault — Components

CLI Subcommands (30+)

Init/Setup

  • swarmvault init — Initialize vault
  • swarmvault quickstart <path> — Init + ingest + compile + graph viewer
  • swarmvault demo [--no-serve] — Zero-config walkthrough
  • swarmvault scan <path> — Fast scratch pass
  • swarmvault clone <github-url> — Clone + ingest GitHub repo

Ingestion

  • swarmvault ingest <path> — Ingest source
  • swarmvault source add <input> — Register recurring source
  • swarmvault source reload — Refresh registered sources
  • swarmvault inbox import — Batch capture import

Compilation + Review

  • swarmvault compile [--approve] [--max-tokens N] — Generate wiki
  • swarmvault review list|show|accept|reject — Manage approval queue
  • swarmvault candidate list|promote|archive — Manage new concept candidates

Query + Exploration

  • swarmvault query "<question>" — Saved query with answer
  • swarmvault chat "<question>" — Persisted multi-turn conversation
  • swarmvault chat --resume <id> — Resume prior conversation
  • swarmvault explore "<question>" [--steps N] — Research loops

Context + Tasks

  • swarmvault context build "<goal>" --target --budget — Agent handoff bundle
  • swarmvault context list|show|delete — Manage context packs
  • swarmvault task start|update|finish|resume — Task ledger

Graph Operations

  • swarmvault graph serve — Live workspace + workbench
  • swarmvault graph query "<seed>" — Graph traversal
  • swarmvault graph share --post|--svg|--bundle — Share artifacts
  • swarmvault graph merge — Merge multiple graphs
  • swarmvault graph cluster — Community detection
  • swarmvault graph blast <target> — Reverse-import impact analysis
  • swarmvault graph export --html|--json|--obsidian|--canvas|--neo4j

Maintenance

  • swarmvault doctor [--repair] — Health check + repair
  • swarmvault lint [--conflicts] [--web] — Schema validation
  • swarmvault diff — Graph change summary vs. baseline
  • swarmvault watch — Watch mode with git hooks

Export + Integration

  • swarmvault export ai --out <dir> — llms.txt, JSON-LD, manifest
  • swarmvault mcp — Start MCP server

Skill File

skills/swarmvault/SKILL.md — Full OpenClaw-compatible skill with 3.15.0 metadata. Registers with swarmvault or vault binary via OpenClaw's anyBins requirement.

Standalone Template

templates/llm-wiki-schema.md — Zero-install schema template for any LLM agent.

Desktop App

packages/viewer/ + downloadable desktop binary — includes graph viewer with interactive web app.

Obsidian Plugin

packages/obsidian-plugin/ — Obsidian integration.

05

Prompts

SwarmVault — Prompts

Verbatim Excerpt 1: SKILL.md — Quick Checks (from skills/swarmvault/SKILL.md)

## Quick checks

- Work from the vault root.
- Use `swarmvault next` when you need a read-only orientation command before deciding whether to initialize, ingest, compile, query, review, or refresh.
- If the vault does not exist yet, run `swarmvault init`.
- Use `swarmvault demo --no-serve` when the user wants the fastest zero-config walkthrough before pointing SwarmVault at their own sources.
- Use `swarmvault quickstart <file-or-directory-or-github-url>` as the beginner-friendly first-run path when the user wants init + ingest + compile + graph viewer in one command.
- Read `swarmvault.schema.md` before compile or query work. It is the vault's operating contract.
- If `wiki/graph/report.md` exists, use it before broad repo search.

Technique: Read-state-before-acting principle. swarmvault next as orientation gate before any action. Explicit preference ordering (orientation → schema → report → broad search). Canonical state-reading sequence.


Verbatim Excerpt 2: SKILL.md — Working Rules

## Working rules

- Prefer changing the schema before re-running compile when organization or grounding is wrong.
- Treat `wiki/` and `state/` as first-class outputs. Inspect them instead of trusting a single chat answer.
- Use saved chat transcripts and static AI exports as durable handoff artifacts when the user asks for continuity across sessions or tools.
- Keep raw sources immutable. Put corrections in schema, new sources, or saved outputs rather than manually rewriting generated provenance.

Technique: Immutable-input + mutable-schema design principle. "Treat wiki/ and state/ as first-class outputs" — the generated artifacts are trusted over live LLM responses. Anti-pattern prohibition: "manually rewriting generated provenance."


Verbatim Excerpt 3: Contradiction Handling (from README)

"Won't hallucinations compound?" — Every edge is tagged `extracted`, `inferred`, or `ambiguous`. Contradiction detection flags conflicting claims. `compile --approve` stages all changes into reviewable approval bundles. New concepts land in `wiki/candidates/` first. `lint --conflicts` audits for contradictions on demand.

Technique: Evidence-class tagging (extracted/inferred/ambiguous) + approval gate before committing LLM output to the wiki. Addresses a known failure mode directly.

09

Uniqueness

SwarmVault — Uniqueness

Differs From Seeds

SwarmVault is most similar to ccmemory from the seeds (both are graph-based agent memory stores with MCP servers). Differences: ccmemory uses Neo4j + vector embeddings for agent memory of code context; SwarmVault uses SQLite FTS + optional embeddings for personal knowledge management across all content types (30+ formats), with a full CLI (30+ subcommands), desktop app, Obsidian plugin, approval queues, task ledgers, and context packs. SwarmVault's offline-first heuristic provider (no API keys needed) contrasts with ccmemory's Neo4j infrastructure requirement. The evidence-class tagging (extracted/inferred/ambiguous) and contradiction detection are not found in any seed. The llms.txt AI export format and swarmvault context build bounded handoff bundles are unique.

Positioning

Production-grade implementation of Karpathy's LLM Wiki pattern — the knowledge vault for individuals and teams who want compounding knowledge rather than ephemeral chat. Positioned explicitly against Karpathy's original gist and Obsidian (as an alternative with AI-maintained connections).

Observable Failure Modes

  1. Node >= 24 requirement: Strict version requirement may create friction on older systems
  2. LLM hallucination compounding: Despite approval queues, continuous compile + accept could let errors accumulate in wiki
  3. Schema complexity: swarmvault.schema.md must be well-defined to get useful output; poor schema = poor wiki
  4. 30+ subcommands: Steep learning curve; swarmvault next helps but complexity is real
  5. Watch mode + git hooks: Background daemon adds complexity for simpler use cases

What Makes It Extraordinary

The evidence-class tagging system (extracted/inferred/ambiguous) with approval queues — the framework explicitly accounts for LLM hallucination compounding, which is the most common criticism of wiki-style knowledge systems. The context build bounded handoff for agents (token-limited evidence packs) is the most agent-aware handoff mechanism in the corpus.

04

Workflow

SwarmVault — Workflow

Core Loop (from SKILL.md)

Step Command Description
1 swarmvault next Read-only orientation (what state is vault in?)
2 swarmvault init Create vault (if not exists)
3 Edit swarmvault.schema.md Set naming rules, categories, grounding
4 swarmvault source add <path> Register recurring source
5 swarmvault ingest <path> One-off ingestion
6 swarmvault compile Generate wiki from sources
7 `swarmvault review list accept
8 swarmvault query "<question>" Saved query
9 swarmvault graph serve Interactive graph workspace
10 swarmvault mcp Expose as MCP server

Quick Start Path

npm install -g @swarmvaultai/cli
swarmvault quickstart ./your-repo
# → init + ingest + compile + graph viewer in one command

Approval Queue Workflow

Changes via compile --approve are staged into review queue before committing to wiki:

  • swarmvault review list — see pending changes
  • swarmvault review accept <id> — approve and apply
  • swarmvault review reject <id> — discard

Agent Handoff Workflow

swarmvault context build "implement feature X" --target src/ --budget 8000
# → Creates bounded evidence pack for agent consumption

swarmvault task start "implement feature X" --target src/
# → Creates task ledger for durable decision tracking

Git Automation (optional)

swarmvault ingest|compile|query --commit — auto-commit wiki + state changes after each run.

Watch Mode

swarmvault watch --lint --repo — auto-refresh on file changes; git hooks via swarmvault hook install.

06

Memory Context

SwarmVault — Memory & Context

State Storage

  • state/graph.json — machine-readable knowledge graph (nodes + edges with evidence tags)
  • state/retrieval/ — local search index (SQLite FTS + optional semantic embeddings)
  • wiki/ — generated markdown pages (persistent, compounding artifact)
  • swarmvault.config.json — vault configuration
  • swarmvault.schema.md — vault operating contract

Memory Type

Hybrid — SQLite FTS for full-text search + optional vector embeddings for semantic search + JSON graph for typed knowledge graph.

Evidence Classification

Each graph edge is tagged:

  • extracted — directly from source text
  • inferred — LLM-derived from evidence
  • ambiguous — conflicting or uncertain claims

Cross-Session Handoff

Yes — the wiki, graph, and search index all persist across sessions. Chat sessions saved to wiki/outputs/chat-sessions/ for cross-session continuity.

Context Packs

swarmvault context build "<goal>" --budget <tokens> — creates bounded evidence pack for agent handoff. Saved to state/ for reuse.

Task Ledger

swarmvault task start|update|finish|resume — durable task record with decisions, linked context packs, changed paths, outcomes, follow-ups.

AI Export

swarmvault export ai --out <dir> — generates llms.txt, full text, JSON-LD graph data, manifest for external agents/crawlers.

Graph Snapshots

swarmvault diff — compare current graph against last committed baseline in git.

Compaction

compile --max-tokens <N> — bounds generated wiki to fit within token budget.

07

Orchestration

SwarmVault — Orchestration

Multi-Agent Support

Yes — via MCP server. Multiple agents can consume vault tools via swarmvault mcp. Context packs and task ledgers are explicitly designed for agent handoffs.

Orchestration Pattern

Sequential — the vault's core loop is sequential (init → ingest → compile → query). MCP enables parallel tool consumption by multiple agents.

Isolation Mechanism

None — vault is a local filesystem. No container or sandbox isolation.

Subagent Definition Format

Not applicable as an agent runtime. The swarmvault skill file enables agents to use vault tools.

Multi-Model Usage

Yes — configurable provider system:

  • heuristic — local offline (default, no API keys)
  • Ollama + Gemma (recommended fully-local setup: ollama pull gemma4)
  • Any OpenAI-compatible backend
  • Cloud providers (optional)

Provider tasks: tasks.compileProvider, tasks.queryProvider, tasks.lintProvider, tasks.audioProvider

Execution Mode

Interactive loop — CLI-driven, each command runs to completion. Watch mode adds background daemon capability.

Context Compaction

Yes — compile --max-tokens <N> explicitly bounds output to token budget. The context build command creates bounded evidence packs.

Crash Recovery

No — vault state is file-based; no rollback. Approval queues provide a safety gate before wiki changes are committed.

Cross-Session Handoff

Yes — graph.json, wiki/, context packs, task ledgers, chat sessions all persist and are designed for handoff.

Streaming Output

graph serve starts a live web workspace with real-time updates.

08

Ui Cli Surface

SwarmVault — UI / CLI Surface

CLI Binary

  • Name: swarmvault (alias: vault)
  • Install: npm install -g @swarmvaultai/cli
  • Node >= 24 required
  • Not a thin wrapper — own Node.js engine

CLI Subcommands (30+)

Core: init, quickstart, demo, scan, clone, ingest, source, compile, review, query, chat, explore, context, task, export, mcp, graph, doctor, lint, diff, watch, next

Graph operations: graph serve, graph query, graph share, graph merge, graph cluster, graph blast, graph export, graph tree, graph stats, graph validate, graph status

Local Web UI

  • Graph viewer: swarmvault graph serve — live workspace at unknown port
    • Health workbench
    • Memory dashboard
    • Bookmarklet clipper
    • Prioritized next actions
    • Interactive graph navigation
  • Desktop app: Download from swarmvault.ai/download
    • Bundled runtime (no Node.js required)
    • macOS, Windows, Linux

Obsidian Plugin

packages/obsidian-plugin/ — Obsidian integration for wiki browsing.

MCP Server

swarmvault mcp — MCP server for agent consumption of vault tools:

  • Browse, search, query wiki
  • Build context packs
  • Manage tasks
  • Inspect vault health

Cross-Tool Portability

High — llms.txt + JSON-LD AI export format for any agent/crawler. MCP server for any MCP client. Standalone schema template (templates/llm-wiki-schema.md) for zero-install usage.

Observability

  • swarmvault doctor — health check across graph, retrieval, review queues, watch state, migrations, managed sources, task state
  • swarmvault lint — schema validation + contradiction detection
  • swarmvault diff — graph change summary

Related frameworks

same archetype · same primary tool · same memory type

alirezarezvani/claude-skills ★ 16k

313+ skills for 12 AI tools covering engineering, marketing, C-level advisory, compliance, research, and finance — all from one…

MoAI-ADK ★ 1.0k

Implements Harness Engineering as a Go-binary-installed Claude Code environment with auto-TDD/DDD methodology selection, 20-event…

REAP (c-d-cc/reap) ★ 41

Prevent context loss, scattered development, and forgotten lessons through a generation-based lifecycle where AI and human…

Codex Harness MCP ★ 7

Gives MCP-capable coding agents a local contract-lifecycle harness with governance audits and explicit completion gates.

meta-agent-teams (jbrahy) ★ 2

Build self-improving AI agent teams via a supervised training loop: specialist agents advise, a meta-agent evolves prompts based…

Browser Harness ★ 14k

Thin, self-healing CDP harness connecting an LLM to the user's real Chrome browser with coordinate-first clicking and…