SwarmVault

swarmvault · swarmclawai/swarmvault · ★ 492 · last commit 2026-05-20

Primitive shape 37 total

Commands 35 Skills 1 MCP tools 1

Summary

SwarmVault — Summary

SwarmVault is a local-first LLM Wiki CLI and knowledge graph builder that implements Andrej Karpathy's three-layer pattern (raw sources → wiki → schema) as a production-grade tool. It ingests books, articles, notes, code, URLs, transcripts, and datasets into a durable markdown wiki plus a local SQLite + semantic graph, with MCP server, desktop app, git-based automation, and agent-facing context-pack and task-ledger workflows. The swarmvault CLI has 30+ subcommands covering init, ingest, compile, query, review, graph operations, chat sessions, context builds, task ledgers, and AI export. An OpenClaw-compatible skill file ships in skills/swarmvault/SKILL.md.

SwarmVault is most similar to ccmemory from the seeds (both are graph-based agent memory stores), but differs fundamentally: ccmemory uses Neo4j + vector embeddings for agent memory of code; SwarmVault is a local SQLite + markdown wiki for personal knowledge management across all content types (books, papers, URLs, code, audio, video), with a full CLI, desktop app, MCP server, and an approval queue for LLM-generated changes.

Overview

SwarmVault — Overview

Origin

Built by SwarmClaw AI (swarmvault.ai). Ships under MIT license. Inspired by Andrej Karpathy's LLM Wiki gist (https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f). The README explicitly positions against the original gist:

"If you liked Karpathy's LLM Wiki gist, SwarmVault is the production-grade version."

Philosophy

From the README:

"SwarmVault turns docs, code, transcripts, notes, and URLs into a durable markdown wiki plus a local graph you can inspect, query, and hand to agents."

"In the tradition of Vannevar Bush's Memex (1945) — a personal, curated knowledge store with associative trails between documents — SwarmVault treats the connections between sources as valuable as the sources themselves. The part Bush couldn't solve was who does the maintenance. The LLM handles that."

Three-Layer Architecture (Karpathy's Pattern)

Raw sources (raw/) — immutable, curated source documents. Never modified by SwarmVault.
The wiki (wiki/) — LLM-generated + human-authored markdown. Compounding artifact.
The schema (swarmvault.schema.md) — conventions, grounding, what matters in this domain.

Addressing Community Concerns (from README)

"Won't hallucinations compound?" → Every edge tagged extracted, inferred, or ambiguous. Contradiction detection. Approval queues (compile --approve).
"Does it scale past 100 pages?" → Hybrid search (SQLite FTS + semantic embeddings). compile --max-tokens.
"Do I need API keys?" → No. Built-in heuristic provider is fully offline.

Release Cadence

SKILL.md version 3.15.0 — active development with frequent releases.

Architecture

SwarmVault — Architecture

Distribution

CLI: npm install -g @swarmvaultai/cli
Desktop app: Download from swarmvault.ai/download (macOS, Windows, Linux — bundles own runtime)
Obsidian plugin: packages/obsidian-plugin/

Directory Structure

packages/
  cli/              # @swarmvaultai/cli Node.js package
  engine/           # Core vault engine
  obsidian-plugin/  # Obsidian plugin
  viewer/           # Graph viewer package
skills/
  swarmvault/
    SKILL.md        # OpenClaw-compatible skill file
    README.md
    TROUBLESHOOTING.md
    examples/
    references/
    validation/
templates/
  llm-wiki-schema.md  # Standalone schema template (zero install)
docs/
scripts/
smoke/              # Smoke tests
validation/
worked/

Required Runtime

Node.js >= 24 (CLI)
Desktop app bundles own runtime (no Node.js required)

Install Complexity

One-liner npm install. Desktop app: download binary.

Vault Artifacts (on disk)

raw/               # Immutable source documents
wiki/              # Generated markdown + saved outputs
  outputs/         # Query results, chat sessions, context packs
  dashboards/      # Automated dashboards
  graph/           # Share cards, reports
state/
  graph.json       # Machine-readable knowledge graph
  retrieval/       # Local search index
swarmvault.config.json
swarmvault.schema.md

Output Directory Override

SWARMVAULT_OUT — override root for generated artifact directories.

MCP Server

swarmvault mcp — starts MCP server for agent consumption of vault tools.

Components

SwarmVault — Components

CLI Subcommands (30+)

Init/Setup

swarmvault init — Initialize vault
swarmvault quickstart <path> — Init + ingest + compile + graph viewer
swarmvault demo [--no-serve] — Zero-config walkthrough
swarmvault scan <path> — Fast scratch pass
swarmvault clone <github-url> — Clone + ingest GitHub repo

Ingestion

swarmvault ingest <path> — Ingest source
swarmvault source add <input> — Register recurring source
swarmvault source reload — Refresh registered sources
swarmvault inbox import — Batch capture import

Compilation + Review

swarmvault compile [--approve] [--max-tokens N] — Generate wiki
swarmvault review list|show|accept|reject — Manage approval queue
swarmvault candidate list|promote|archive — Manage new concept candidates

Query + Exploration

swarmvault query "<question>" — Saved query with answer
swarmvault chat "<question>" — Persisted multi-turn conversation
swarmvault chat --resume <id> — Resume prior conversation
swarmvault explore "<question>" [--steps N] — Research loops

Context + Tasks

swarmvault context build "<goal>" --target --budget — Agent handoff bundle
swarmvault context list|show|delete — Manage context packs
swarmvault task start|update|finish|resume — Task ledger

Graph Operations

swarmvault graph serve — Live workspace + workbench
swarmvault graph query "<seed>" — Graph traversal
swarmvault graph share --post|--svg|--bundle — Share artifacts
swarmvault graph merge — Merge multiple graphs
swarmvault graph cluster — Community detection
swarmvault graph blast <target> — Reverse-import impact analysis
swarmvault graph export --html|--json|--obsidian|--canvas|--neo4j

Maintenance

swarmvault doctor [--repair] — Health check + repair
swarmvault lint [--conflicts] [--web] — Schema validation
swarmvault diff — Graph change summary vs. baseline
swarmvault watch — Watch mode with git hooks

Export + Integration

swarmvault export ai --out <dir> — llms.txt, JSON-LD, manifest
swarmvault mcp — Start MCP server

Skill File

skills/swarmvault/SKILL.md — Full OpenClaw-compatible skill with 3.15.0 metadata. Registers with swarmvault or vault binary via OpenClaw's anyBins requirement.

Standalone Template

templates/llm-wiki-schema.md — Zero-install schema template for any LLM agent.

Desktop App

packages/viewer/ + downloadable desktop binary — includes graph viewer with interactive web app.

Obsidian Plugin

packages/obsidian-plugin/ — Obsidian integration.

Prompts

SwarmVault — Prompts

Verbatim Excerpt 1: SKILL.md — Quick Checks (from skills/swarmvault/SKILL.md)

## Quick checks

- Work from the vault root.
- Use `swarmvault next` when you need a read-only orientation command before deciding whether to initialize, ingest, compile, query, review, or refresh.
- If the vault does not exist yet, run `swarmvault init`.
- Use `swarmvault demo --no-serve` when the user wants the fastest zero-config walkthrough before pointing SwarmVault at their own sources.
- Use `swarmvault quickstart <file-or-directory-or-github-url>` as the beginner-friendly first-run path when the user wants init + ingest + compile + graph viewer in one command.
- Read `swarmvault.schema.md` before compile or query work. It is the vault's operating contract.
- If `wiki/graph/report.md` exists, use it before broad repo search.

Technique: Read-state-before-acting principle. swarmvault next as orientation gate before any action. Explicit preference ordering (orientation → schema → report → broad search). Canonical state-reading sequence.

Verbatim Excerpt 2: SKILL.md — Working Rules

## Working rules

- Prefer changing the schema before re-running compile when organization or grounding is wrong.
- Treat `wiki/` and `state/` as first-class outputs. Inspect them instead of trusting a single chat answer.
- Use saved chat transcripts and static AI exports as durable handoff artifacts when the user asks for continuity across sessions or tools.
- Keep raw sources immutable. Put corrections in schema, new sources, or saved outputs rather than manually rewriting generated provenance.

Technique: Immutable-input + mutable-schema design principle. "Treat wiki/ and state/ as first-class outputs" — the generated artifacts are trusted over live LLM responses. Anti-pattern prohibition: "manually rewriting generated provenance."

Verbatim Excerpt 3: Contradiction Handling (from README)

"Won't hallucinations compound?" — Every edge is tagged `extracted`, `inferred`, or `ambiguous`. Contradiction detection flags conflicting claims. `compile --approve` stages all changes into reviewable approval bundles. New concepts land in `wiki/candidates/` first. `lint --conflicts` audits for contradictions on demand.

Technique: Evidence-class tagging (extracted/inferred/ambiguous) + approval gate before committing LLM output to the wiki. Addresses a known failure mode directly.

Uniqueness

SwarmVault — Uniqueness

Differs From Seeds

SwarmVault is most similar to ccmemory from the seeds (both are graph-based agent memory stores with MCP servers). Differences: ccmemory uses Neo4j + vector embeddings for agent memory of code context; SwarmVault uses SQLite FTS + optional embeddings for personal knowledge management across all content types (30+ formats), with a full CLI (30+ subcommands), desktop app, Obsidian plugin, approval queues, task ledgers, and context packs. SwarmVault's offline-first heuristic provider (no API keys needed) contrasts with ccmemory's Neo4j infrastructure requirement. The evidence-class tagging (extracted/inferred/ambiguous) and contradiction detection are not found in any seed. The llms.txt AI export format and swarmvault context build bounded handoff bundles are unique.

Positioning

Production-grade implementation of Karpathy's LLM Wiki pattern — the knowledge vault for individuals and teams who want compounding knowledge rather than ephemeral chat. Positioned explicitly against Karpathy's original gist and Obsidian (as an alternative with AI-maintained connections).

Observable Failure Modes

Node >= 24 requirement: Strict version requirement may create friction on older systems
LLM hallucination compounding: Despite approval queues, continuous compile + accept could let errors accumulate in wiki
Schema complexity: swarmvault.schema.md must be well-defined to get useful output; poor schema = poor wiki
30+ subcommands: Steep learning curve; swarmvault next helps but complexity is real
Watch mode + git hooks: Background daemon adds complexity for simpler use cases

What Makes It Extraordinary

The evidence-class tagging system (extracted/inferred/ambiguous) with approval queues — the framework explicitly accounts for LLM hallucination compounding, which is the most common criticism of wiki-style knowledge systems. The context build bounded handoff for agents (token-limited evidence packs) is the most agent-aware handoff mechanism in the corpus.

Workflow

SwarmVault — Workflow

Core Loop (from SKILL.md)

Step	Command	Description
1	`swarmvault next`	Read-only orientation (what state is vault in?)
2	`swarmvault init`	Create vault (if not exists)
3	Edit `swarmvault.schema.md`	Set naming rules, categories, grounding
4	`swarmvault source add <path>`	Register recurring source
5	`swarmvault ingest <path>`	One-off ingestion
6	`swarmvault compile`	Generate wiki from sources
7	`swarmvault review list	accept
8	`swarmvault query "<question>"`	Saved query
9	`swarmvault graph serve`	Interactive graph workspace
10	`swarmvault mcp`	Expose as MCP server

Quick Start Path

npm install -g @swarmvaultai/cli
swarmvault quickstart ./your-repo
# → init + ingest + compile + graph viewer in one command

Approval Queue Workflow

Changes via compile --approve are staged into review queue before committing to wiki:

swarmvault review list — see pending changes
swarmvault review accept <id> — approve and apply
swarmvault review reject <id> — discard

Agent Handoff Workflow

swarmvault context build "implement feature X" --target src/ --budget 8000
# → Creates bounded evidence pack for agent consumption

swarmvault task start "implement feature X" --target src/
# → Creates task ledger for durable decision tracking

Git Automation (optional)

swarmvault ingest|compile|query --commit — auto-commit wiki + state changes after each run.

Watch Mode

swarmvault watch --lint --repo — auto-refresh on file changes; git hooks via swarmvault hook install.

Memory Context

SwarmVault — Memory & Context

State Storage

state/graph.json — machine-readable knowledge graph (nodes + edges with evidence tags)
state/retrieval/ — local search index (SQLite FTS + optional semantic embeddings)
wiki/ — generated markdown pages (persistent, compounding artifact)
swarmvault.config.json — vault configuration
swarmvault.schema.md — vault operating contract

Memory Type

Hybrid — SQLite FTS for full-text search + optional vector embeddings for semantic search + JSON graph for typed knowledge graph.

Evidence Classification

Each graph edge is tagged:

extracted — directly from source text
inferred — LLM-derived from evidence
ambiguous — conflicting or uncertain claims

Cross-Session Handoff

Yes — the wiki, graph, and search index all persist across sessions. Chat sessions saved to wiki/outputs/chat-sessions/ for cross-session continuity.

Context Packs

swarmvault context build "<goal>" --budget <tokens> — creates bounded evidence pack for agent handoff. Saved to state/ for reuse.

Task Ledger

swarmvault task start|update|finish|resume — durable task record with decisions, linked context packs, changed paths, outcomes, follow-ups.

AI Export

swarmvault export ai --out <dir> — generates llms.txt, full text, JSON-LD graph data, manifest for external agents/crawlers.

Graph Snapshots

swarmvault diff — compare current graph against last committed baseline in git.

Compaction

compile --max-tokens <N> — bounds generated wiki to fit within token budget.

Orchestration

SwarmVault — Orchestration

Multi-Agent Support

Yes — via MCP server. Multiple agents can consume vault tools via swarmvault mcp. Context packs and task ledgers are explicitly designed for agent handoffs.

Orchestration Pattern

Sequential — the vault's core loop is sequential (init → ingest → compile → query). MCP enables parallel tool consumption by multiple agents.

Isolation Mechanism

None — vault is a local filesystem. No container or sandbox isolation.

Subagent Definition Format

Not applicable as an agent runtime. The swarmvault skill file enables agents to use vault tools.

Multi-Model Usage

Yes — configurable provider system:

heuristic — local offline (default, no API keys)
Ollama + Gemma (recommended fully-local setup: ollama pull gemma4)
Any OpenAI-compatible backend
Cloud providers (optional)

Provider tasks: tasks.compileProvider, tasks.queryProvider, tasks.lintProvider, tasks.audioProvider

Execution Mode

Interactive loop — CLI-driven, each command runs to completion. Watch mode adds background daemon capability.

Context Compaction

Yes — compile --max-tokens <N> explicitly bounds output to token budget. The context build command creates bounded evidence packs.

Crash Recovery

No — vault state is file-based; no rollback. Approval queues provide a safety gate before wiki changes are committed.

Cross-Session Handoff

Yes — graph.json, wiki/, context packs, task ledgers, chat sessions all persist and are designed for handoff.

Streaming Output

graph serve starts a live web workspace with real-time updates.

Ui Cli Surface

SwarmVault — UI / CLI Surface

CLI Binary

Name: swarmvault (alias: vault)
Install: npm install -g @swarmvaultai/cli
Node >= 24 required
Not a thin wrapper — own Node.js engine

CLI Subcommands (30+)

Core: init, quickstart, demo, scan, clone, ingest, source, compile, review, query, chat, explore, context, task, export, mcp, graph, doctor, lint, diff, watch, next

Graph operations: graph serve, graph query, graph share, graph merge, graph cluster, graph blast, graph export, graph tree, graph stats, graph validate, graph status

Local Web UI

Graph viewer: swarmvault graph serve — live workspace at unknown port
- Health workbench
- Memory dashboard
- Bookmarklet clipper
- Prioritized next actions
- Interactive graph navigation
Desktop app: Download from swarmvault.ai/download
- Bundled runtime (no Node.js required)
- macOS, Windows, Linux

Obsidian Plugin

packages/obsidian-plugin/ — Obsidian integration for wiki browsing.

MCP Server

swarmvault mcp — MCP server for agent consumption of vault tools:

Browse, search, query wiki
Build context packs
Manage tasks
Inspect vault health

Cross-Tool Portability

High — llms.txt + JSON-LD AI export format for any agent/crawler. MCP server for any MCP client. Standalone schema template (templates/llm-wiki-schema.md) for zero-install usage.

Observability

swarmvault doctor — health check across graph, retrieval, review queues, watch state, migrations, managed sources, task state
swarmvault lint — schema validation + contradiction detection
swarmvault diff — graph change summary

Related frameworks

same archetype · same primary tool · same memory type

alirezarezvani/claude-skills ★ 16k

A18 Self-evolving

313+ skills for 12 AI tools covering engineering, marketing, C-level advisory, compliance, research, and finance — all from one…

MoAI-ADK ★ 1.0k

A18 Self-evolving

Implements Harness Engineering as a Go-binary-installed Claude Code environment with auto-TDD/DDD methodology selection, 20-event…

REAP (c-d-cc/reap) ★ 41

A18 Self-evolving

Prevent context loss, scattered development, and forgotten lessons through a generation-based lifecycle where AI and human…

Codex Harness MCP ★ 7

A18 Self-evolving

Gives MCP-capable coding agents a local contract-lifecycle harness with governance audits and explicit completion gates.

meta-agent-teams (jbrahy) ★ 2

A18 Self-evolving

Build self-improving AI agent teams via a supervised training loop: specialist agents advise, a meta-agent evolves prompts based…

Browser Harness ★ 14k

A18 Self-evolving

Thin, self-healing CDP harness connecting an LLM to the user's real Chrome browser with coordinate-first clicking and…

Distribution

Type: npm-package
License: MIT
Install: one-liner
Version: 3.15.0

Surfaces

CLI binary: swarmvault
CLI subcmds: 35
Local UI: web-dashboard
Tech stack: React/web (graph viewer) + Electron (desktop app)

Components

Commands: 35
Skills: 1
Subagents: 0
Hooks: 0
MCP servers: 1
Scripts: 1
Templates: 1

Workflow

Phases: 9
Approval gates: 2
Spec format: markdown
Spec storage: flat-files
Delta or full: mixed

Orchestration

Multi-agent: Yes
Pattern: sequential
Isolation: none
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text+vision

Execution

Mode: interactive-loop
Crash recovery: No
Compaction: Yes
Session handoff: Yes
Streaming: Yes

Memory

Type: hybrid
Persistence: global
Search: hybrid
State files: 5 files

Quality

TDD: No
TDD mechanism: none
Validators: 1
Self-review: none

Git / Observability

Auto commit: Yes
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: Yes
Audit format: structured-md
Replay: No

Tools

Primary: Claude Code
Targets: 5
Portability: high

Signals

Stars: 492
Last commit: 2026-05-20
Maintainer: active
Quality score: 4.1/10