Loki Mode

loki-mode · asklokesh/loki-mode · ★ 944 · last commit 2026-05-25

Takes a spec (PRD/issue/OpenAPI) and autonomously produces a production-ready Git repository through RARV cycles, 11 blocking quality gates, and 41 specialized agents.

Best whenAutonomous agents should never stop or ask questions — RARV cycles with graduated retry (3→simplify→5→dead-letter) and 11 quality gates are the only acceptab…

Skip ifAsking clarifying questions during autonomous execution, Shipping code that fails any of the 11 quality gates

vs seeds

claude-flow's Node.js, ships a web das…

Primitive shape 116 total

Commands 30 Skills 20 Subagents 41 MCP tools 25

Summary

Loki Mode — Summary

Loki Mode is an autonomous multi-agent system that takes a spec input (Markdown PRD, GitHub issue, OpenAPI/YAML doc, or one-line brief) and produces a Git repository with source code, tests, Docker configs, CI/CD pipelines, and audit logs — with minimal human intervention. Its central execution model is RARV (Reason-Act-Reflect-Verify) cycles driven by 41 specialized agent types organized into 8 domain swarms (Engineering, Operations, Business, Data, Product, Growth, Review, Orchestration), with 11 quality gates that must pass before code is considered done. It ships an npm CLI (loki), a Python MCP server (25 tools for task queue, memory, state management, and code search), a vanilla-JS web dashboard (port 57374), and an episodic/semantic/procedural memory system with ChromaDB vector search. At version 7.7.11 with 944 stars, Loki Mode is the most technically ambitious framework in this batch — it spans the full stack from PRD ingestion through deployment with automated rollback, blind 3-reviewer code review, anti-sycophancy checks, and a legacy code healing mode. Compared to seeds, it most closely resembles claude-flow's MCP-anchored, multi-agent swarm architecture but with a different primary language (Bash→Bun migration) and a stronger focus on PRD-to-deployed-product automation rather than task graph management.

Overview

Loki Mode — Overview

Origin

Created by asklokesh (asklokesh/loki-mode). BUSL-1.1 license (free for personal/internal/academic, commercial requires license). Version 7.7.11, pushed 2026-05-25 (very active). Python + TypeScript/Bun.

Philosophy

From SKILL.md:

"You are an autonomous agent. You make decisions. You do not ask questions. You do not stop." "Spec in, product out."

Loki Mode is built on the premise that a truly autonomous system should be able to convert any spec format into shipped code without hand-holding. The RARV (Reason-Act-Reflect-Verify) cycle is the fundamental execution primitive: every action is preceded by reasoning about priority, followed by reflection on outcome, and gated by automated verification. Failure triggers retry with a different approach (up to 3 times), then simplified approach (up to 5 times), then dead-letter queue — never silent failure or infinite loops.

The 11 quality gates are not optional checkpoints; they are blocking conditions. Critical/High/Medium severity findings block merges. The blind 3-reviewer system with anti-sycophancy check (if all 3 agree, run a Devil's Advocate reviewer) is explicitly designed to prevent the model from rubber-stamping its own work.

Key design philosophies

Autonomy over interactivity: Unlike other frameworks in this batch, Loki Mode is explicitly designed to run unattended. It requires --dangerously-skip-permissions.
Production quality as default: 11 quality gates, test mutation detector, backward compatibility gate, documentation coverage gate.
Memory compounds over time: The compound learning system extracts novel solutions (bug fixes, non-obvious patterns) to ~/.loki/solutions/ for future reuse.
Multi-provider resilience: Claude, Codex, Cline, Aider with automatic failover.
Legacy healing as first-class concern: loki heal archaeology/stabilize/isolate/modernize/validate phases for legacy codebases.

Runtime migration

Loki is undergoing a Bash-to-Bun migration. Most commands still run on the Bash runtime (autonomy/loki). Read-only commands (version, status, doctor, etc.) have been ported to Bun (loki-ts/). Rollback: LOKI_LEGACY_BASH=1.

Architecture

Loki Mode — Architecture

Distribution

npm package: loki-mode
Binary: loki
Version: 7.7.11
Install: bun install -g loki-mode (recommended) or npm install -g loki-mode
Also: brew tap asklokesh/tap && brew install loki-mode
Also: docker pull asklokesh/loki-mode:7.5.11

Required runtime

Bun 1.3.0+ (recommended; bash fallback if unavailable)
Python 3.x (for MCP server and memory engine)
Docker (optional, for sandbox mode)

Directory tree (source)

asklokesh/loki-mode/
├── bin/loki                   # Shim: routes to Bun CLI or bash fallback
├── autonomy/loki              # Main Bash runtime (30+ subcommands)
├── loki-ts/                   # TypeScript/Bun runtime (migration target)
│   └── dist/loki.js           # Bundled Bun binary (~152KB)
├── mcp/
│   └── server.py              # Python MCP server (25 tools)
├── memory/
│   └── engine.py              # Episodic/semantic/procedural memory + ChromaDB
├── dashboard-ui/              # Vanilla JS web components (port 57374)
├── dashboard/                 # Dashboard API server
├── skills/                    # 20 skill/reference markdown files
├── references/                # 24 reference documents
├── agents/                    # Agent type references
├── swarm/                     # Swarm coordination
├── vscode-extension/          # VS Code extension (bundled)
├── .loki/                     # Per-project state directory
│   ├── state/orchestrator.json
│   ├── queue/pending.json
│   ├── session.json
│   └── PAUSE / STOP           # Control files
└── SKILL.md                   # Claude Code skill entry point

State files (per project)

.loki/
├── state/orchestrator.json    # currentPhase, tasksCompleted, tasksFailed
├── queue/pending.json         # task queue
├── queue/dead-letter.json     # failed tasks after 5 retries
├── session.json               # session registration (pid, provider, status)
├── docs/                      # API documentation coverage
├── healing/                   # Healing mode state
│   ├── friction-map.json
│   ├── behavioral-baseline/
│   └── characterization-tests/
├── metrics/                   # Performance benchmarks
└── solutions/                 # Compound learning repository

Target AI tools

Claude Code (primary), Codex, Cline, Aider (multi-provider via abstraction layer)

MCP server port

Default STDIO; HTTP mode available (--transport http)

Dashboard port

57374 (hardcoded default)

Components

Loki Mode — Components

CLI Subcommands (30+)

Key subcommands identified from autonomy/loki help text:

Command	Purpose
`loki start [PRD	ISSUE
`loki quick "<brief>"`	One-line task without scaffolding
`loki init <dir>`	Initialize project
`loki issue <url/ref>`	Build from GitHub/GitLab/Jira issue
`loki plan <prd>`	Generate plan without executing
`loki heal`	Legacy codebase healing mode
`loki memory list/index`	Memory management
`loki dashboard`	Start web dashboard
`loki web`	Start web server
`loki doctor`	Health check
`loki provider show/list`	Provider management
`loki self-update`	In-place upgrade
`loki checkpoint`	Checkpoint/restore
`loki audit`	Audit report
`loki metrics`	Metrics report
`loki status`	Current project status
`loki config`	Configuration management
`loki stop`	Stop running session
`loki cluster`	Multi-machine cluster management
`loki magic`	Magic module commands

MCP Server Tools (25)

Exposed via mcp/server.py:

Tool	Purpose
`loki_memory_retrieve`	Retrieve from memory
`loki_memory_store_pattern`	Store a solution pattern
`loki_task_queue_list`	List task queue
`loki_task_queue_add`	Add task to queue
`loki_task_queue_update`	Update task status
`loki_state_get`	Get orchestrator state
`loki_metrics_efficiency`	Efficiency metrics
`loki_consolidate_memory`	Consolidate memory
`loki_complete_task`	Mark task complete
`loki_start_project`	Initialize project
`loki_project_status`	Project status
`loki_agent_metrics`	Agent performance metrics
`loki_checkpoint_restore`	Checkpoint management
`loki_quality_report`	Quality gate report
`loki_code_search`	ChromaDB code search
`loki_code_search_stats`	Code search statistics
`mem_search`	Memory search
`mem_timeline`	Memory timeline
`mem_get`	Get memory entry
`loki_get_hotspots`	Code hotspots
`loki_get_co_changes`	Co-change analysis
`loki_get_doc_coverage`	Documentation coverage
`loki_findings`	Review findings
`loki_learnings`	Compound learnings
`loki_counter_evidence_template`	Counter-evidence for review

Agent Types (41 across 8 swarms)

Engineering (8): eng-frontend, eng-backend, eng-database, eng-mobile, eng-api, eng-qa, eng-perf, eng-infra

Operations (8): ops-devops, ops-sre, ops-security, ops-monitor, ops-incident, ops-release, ops-cost, ops-compliance

Business (8): biz-marketing, biz-sales, biz-finance, biz-legal, biz-support, biz-hr, biz-investor, biz-partnerships

Data (3): data-analyst, data-ml, data-infra

Product (3+): product management, UX research roles

Growth, Review, Orchestration swarms

Skills / Reference Files (20)

skills/00-index.md, skills/agents.md, skills/quality-gates.md, skills/memory.md, skills/parallel-workflows.md, skills/healing.md, skills/testing.md, skills/providers.md, skills/documentation.md, skills/github-integration.md, and 10 more.

Dashboard UI Components

Vanilla JS Web Components (Shadow DOM):

loki-task-board — Kanban drag-and-drop
loki-session-control — Start/stop/pause buttons
loki-log-stream — Live log streaming
loki-metrics-panel — Performance metrics

Prompts

Loki Mode — Prompts

Excerpt 1: RARV Cycle — Autonomy Rule

Source: SKILL.md

## PRIORITY 2: Execute (RARV Cycle)

Every action follows this cycle. No exceptions.

REASON: What is the highest priority unblocked task? | v ACT: Execute it. Write code. Run commands. Commit atomically. | v REFLECT: Did it work? Log outcome. | v VERIFY: Run tests. Check build. Validate against spec. | +--[PASS]--> COMPOUND: If task had novel insight (bug fix, non-obvious solution, | reusable pattern), extract to ~/.loki/solutions/{category}/{slug}.md | with YAML frontmatter (title, tags, symptoms, root_cause, prevention). | See skills/compound-learning.md for format. | Then mark task complete. Return to REASON. | +--[FAIL]--> Capture error in "Mistakes & Learnings". Rollback if needed. Retry with new approach. After 3 failures: Try simpler approach. After 5 failures: Log to dead-letter queue, move to next task.

Technique: State machine with explicit failure escalation tiers. The RARV cycle is a fixed 4-step loop where VERIFY is non-optional and COMPOUND is an automatic learning step on success. Failure follows a graduated retry policy (3 retries → simplify → dead-letter) rather than infinite looping or silent failure.

Excerpt 2: Context Load — Every Turn

Source: SKILL.md PRIORITY 1

## PRIORITY 1: Load Context (Every Turn)

Execute these steps IN ORDER at the start of EVERY turn:

IF first turn of session:
- Read skills/00-index.md
- Load 1-2 modules matching your current phase
- Register session: Write .loki/session.json with: {"pid": null, "startedAt": "", "provider": "", "invokedVia": "skill", "status": "running", "updatedAt": ""}
Read .loki/state/orchestrator.json
- Extract: currentPhase, tasksCompleted, tasksFailed
Read .loki/queue/pending.json
- IF empty AND phase incomplete: Generate tasks for current phase
- IF empty AND phase complete: Advance to next phase
Check .loki/PAUSE - IF exists: Stop work, wait for removal. Check .loki/STOP - IF exists: End session, update session.json status to "stopped".
EVERY TURN: Update .loki/session.json "updatedAt" field to current ISO timestamp. This keeps the dashboard aware the skill session is alive.

Technique: Mandatory per-turn context reload. Every turn starts with reading the persistent state files (orchestrator.json, pending.json), checking control files (PAUSE, STOP), and updating the heartbeat timestamp. This enables the dashboard to detect stale sessions and enables external control of the agent via filesystem signals.

Excerpt 3: Quality Gate 4 — Anti-Sycophancy

Source: skills/quality-gates.md

## Gate 4: Anti-Sycophancy Check

If [blind reviewer] unanimous approval, run Devil's Advocate reviewer

**Checks:**
- If all 3 reviewers approve → spawn additional "Devil's Advocate" reviewer
- Devil's Advocate is explicitly instructed to find problems, not validate
- If Devil's Advocate finds issues → treat as reviewer finding
- If Devil's Advocate finds nothing → unanimous approval confirmed

Technique: Adversarial subagent as sycophancy check. The Devil's Advocate reviewer is only spawned when unanimous approval would otherwise pass — turning reviewer consensus into a trigger for additional scrutiny rather than a fast exit.

Uniqueness

Loki Mode — Uniqueness

Differs from Seeds

Closest seed is claude-flow (MCP-anchored, 41+ agent types, SQLite+vector memory, swarm patterns). Loki Mode differs from claude-flow in three ways: (1) Loki uses Python + Bash/Bun as its primary runtime vs claude-flow's Node.js, and ships a web dashboard as a first-class UI rather than TUI; (2) Loki's 11 quality gates are more prescriptive and blocking — the anti-sycophancy check (run Devil's Advocate on unanimous reviewer approval) and test mutation detector (flag assertion value changes alongside implementation changes) are novel gate types not seen in claude-flow; (3) Loki's compound learning system (~/.loki/solutions/) builds a global knowledge base of novel solutions across projects, whereas claude-flow's memory is per-project. Loki Mode is also unique in shipping a legacy healing mode (loki heal) with a friction map, characterization tests, and backward compatibility gate — no seed addresses legacy codebase modernization.

Positioning

Loki Mode is the "autonomous startup builder" — it targets developers who want to describe what they're building and walk away, trusting the system to produce tested, documented, deployed code. It is the most fully-featured autonomous agent framework in this batch, with the largest CLI surface, most MCP tools, richest quality gate system, and only framework with a web dashboard and Docker support.

Observable Failure Modes

BUSL-1.1 license: Not truly open source for commercial use. Creates adoption friction for enterprise teams.
Bash runtime deprecation timeline: Phase 6 (sunset bash) has no firm date. Users must track UPGRADING.md carefully.
Gemini CLI deprecated v7.5.18: Teams relying on Gemini had no warning period.
Autonomy by default: --dangerously-skip-permissions is required for normal operation. Security-conscious teams may be uncomfortable.
Context window size: 41 agent types, 25 MCP tools, 20 skill files — loading the full system for a simple task is expensive.
Python dependency: The MCP server and memory engine require Python in addition to Bun, making the install more complex than other npm tools.

Workflow

Loki Mode — Workflow

Main Execution Flow

loki start <spec>
    ↓
detect_complexity() → simple (3 phases) | complex (8 phases)
    ↓
assemble agent team (5-10 for simple, more for complex)
    ↓
RARV cycles per task:
    REASON: highest priority unblocked task
    ACT: execute → write code, run commands, commit atomically
    REFLECT: did it work? log outcome
    VERIFY: run tests, check build, validate against spec
        ↓ PASS → COMPOUND: extract novel insight to ~/.loki/solutions/
        ↓ FAIL → retry (up to 3) → simplify (up to 5) → dead-letter queue
    ↓
11 Quality Gates (blocking)
    ↓
Output: Git repo with source, tests, Docker, CI/CD, audit logs

11 Quality Gates

Gate	Check	Block Level
1	Input Guardrails (scope validation, injection detection)	Critical
2	Static Analysis (CodeQL, ESLint/Pylint, type checking)	High
3	Blind Review (3 parallel reviewers, no cross-visibility)	High
4	Anti-Sycophancy (if all 3 agree → run Devil's Advocate)	High
5	Output Guardrails (code quality, spec compliance, no secrets)	Critical
6	Severity-Based Blocking (Critical/High/Medium = BLOCK)	Variable
7	Test Coverage (Unit: 100% pass, >80% coverage; Integration: 100% pass)	High
8	Mock Detector (flags tautological assertions, high internal mock ratio)	Medium
9	Test Mutation Detector (detects assertion value changes alongside impl changes)	Medium
10	Backward Compatibility (friction map, behavioral preservation, healing mode)	High
11	Documentation Coverage (README, API docs, staleness check within 10 commits)	Medium

Session Control

.loki/PAUSE file → agent stops work, waits for removal
.loki/STOP file → agent ends session, writes terminal status
Session heartbeat: session.json updated every turn; sessions not updated in 5 minutes marked stale

Phase Detection

Simple complexity: 3 phases Complex complexity: 8 phases Override: --simple or --complex flags

Compound Learning

Every RARV pass-with-novel-insight extracts a solution to ~/.loki/solutions/{category}/{slug}.md with YAML frontmatter (title, tags, symptoms, root_cause, prevention). Builds over time into project-specific institutional knowledge.

Memory Context

Loki Mode — Memory & Context

Memory System

Three memory types implemented in memory/engine.py:

Type	Storage	Purpose
Episodic	File-based + vector	Records of specific bead outcomes, fix attempts
Semantic	ChromaDB vector DB	Concept-level knowledge, searchable by similarity
Procedural	File-based	Workflows, patterns, compound learning solutions

State Files

File	Purpose
`.loki/state/orchestrator.json`	currentPhase, tasksCompleted, tasksFailed
`.loki/queue/pending.json`	Active task queue
`.loki/queue/dead-letter.json`	Tasks that failed after 5 retries
`.loki/session.json`	Session registration with heartbeat timestamp
`.loki/PAUSE`	Control file: agent pauses when present
`.loki/STOP`	Control file: agent terminates when present
`~/.loki/solutions/`	Global compound learning repository

Memory Persistence

Project-level (.loki/) + global (~/.loki/solutions/). Solutions extracted during compound learning are global, persisting across projects.

Search Mechanism

Vector search via ChromaDB (loki_code_search MCP tool). Also BM25 full-text search for code search (rank-bm25 dependency in pyproject.toml).

Context Compaction

Loki reads only 1-2 skill modules matching the current phase at session start (lazy loading from skills/00-index.md). The session heartbeat mechanism also implicitly handles compaction — a new session reads fresh state files rather than replaying conversation history.

Cross-Session Handoff

Yes — state files, task queue, and session registration all persist. loki status reads from .loki/state/ to show progress from any terminal.

Orchestration

Loki Mode — Orchestration

Multi-Agent

Yes. 41 agent types across 8 swarms. The orchestrator spawns typically 5-10 agents for simple projects, more for complex. Gate 3 (Blind Review) spawns 3 parallel review agents with no cross-visibility, plus a 4th Devil's Advocate on unanimous approval.

Orchestration Pattern

Parallel fan-out with hierarchical coordination. The orchestrator dispatches tasks to specialized agents (fan-out), collects results, and runs aggregate review gates (hierarchical). Gate 3's blind parallel review is explicitly a parallel-fan-out with isolation.

Agent Spawn Mechanism

Claude Code Task tool (general-purpose subagent_type with role-specific prompts). From skills/agents.md:

Task(
    subagent_type="general-purpose",
    model="opus",
    description="Security review: auth module",
    prompt="You are a security reviewer. Focus on: ..."
)

Isolation Mechanism

Git worktrees for parallel mode (loki start --parallel). Also Docker sandbox mode (loki start --sandbox).

Multi-Model

Yes. 4 providers with automatic failover: Claude, Codex, Cline, Aider. Model tiers (abstract) mapped to providers via loki provider configuration. Gemini CLI deprecated as of v7.5.18.

Execution Mode

Continuous/background daemon. loki start runs autonomously until complete. loki start --bg runs in background. --watch mode polls for new tasks continuously.

Consensus Mechanism

Blind review quorum (3 reviewers + optional Devil's Advocate) for code review gating. Not a distributed consensus protocol (not raft/byzantine) but a simulated multi-reviewer quorum.

Crash Recovery

Yes — session registration with heartbeat, task queue with dead-letter, checkpoint/restore (loki checkpoint). The STOP control file provides graceful termination.

Context Compaction

Yes — lazy skill module loading. Only 1-2 skill modules loaded per session phase. Session.json heartbeat enables fresh state on restart.

Cross-Session Handoff

Yes — full state persistence in .loki/.

Prompt Chaining Pattern

Yes — each RARV cycle's output (code commits, test results) feeds the next cycle's REASON step via the task queue and orchestrator state.

Ui Cli Surface

Loki Mode — UI & CLI Surface

CLI Binary

Name: loki
Package: loki-mode (npm)
Entry: bin/loki (shim → Bun or Bash)
Version: 7.7.11

Key CLI Subcommands

start, quick, init, issue, plan, heal, memory, dashboard, doctor, provider, self-update, checkpoint, audit, metrics, status, config, stop, cluster, magic + ~15 more

Local Web Dashboard

Port: 57374
Tech stack: Vanilla JavaScript ES6 Modules, Shadow DOM Web Components, no framework
Components: loki-task-board (Kanban), loki-session-control (start/stop/pause), loki-log-stream (live logs), loki-metrics-panel
Launch: loki dashboard or loki start --api
Features: Kanban task tracking, session control, live log streaming, metrics

VS Code Extension

Ships vscode-extension/ — not published separately, bundled with the npm package.

MCP Server

mcp/server.py — Python FastMCP server
25 tools
Transports: STDIO (default) or HTTP (--transport http)
Config: .mcp.json in project root

Observability

.loki/session.json with heartbeat — dashboard reads this to detect stale sessions
.loki/state/orchestrator.json — phase progress
.loki/metrics/migration_bench_soak.jsonl — performance benchmarks
loki audit — structured audit report
loki metrics — efficiency report

Docker Support

docker pull asklokesh/loki-mode:7.5.11
docker run --rm asklokesh/loki-mode:7.5.11 start prd.md

Also: docker-compose.yml for multi-service (API + dashboard), sandbox containers.

Related frameworks

same archetype · same primary tool · same memory type

Claude-Flow / Ruflo ★ 55k

A6 Multi-agent orchestrator

Eliminates single-agent context limits and sequential bottlenecks by orchestrating fault-tolerant swarms of specialized AI agents…

Hermes Agent (NousResearch) ★ 168k

A6 Multi-agent orchestrator

Self-improving personal AI agent with closed learning loop, 7 terminal backends, and messaging gateway — not tied to any AI…

OpenCode ★ 165k

A6 Multi-agent orchestrator

Terminal-first AI coding agent with multi-model routing, native desktop app, and a typed .opencode/ configuration system for…

OpenHands ★ 75k

A6 Multi-agent orchestrator

Open-source AI software development platform (open-source Devin alternative) with Docker sandbox isolation, 77.6% SWE-bench…

DeerFlow ★ 70k

A6 Multi-agent orchestrator

Long-horizon superagent that researches, codes, and creates by orchestrating parallel sub-agents with isolated contexts in Docker…

oh-my-openagent (omo) ★ 60k

A6 Multi-agent orchestrator

Multi-provider AI agent orchestration for OpenCode: escape vendor lock-in by routing Sisyphus (Claude/Kimi/GLM) and Hephaestus…

Distribution

Type: npm-package
License: BUSL-1.1
Install: multi-step
Version: 7.7.11

Surfaces

CLI binary: loki
CLI subcmds: 30
Local UI: web-dashboard
UI port: 57374
Tech stack: Vanilla JavaScript ES6 Modules, Shadow DOM Web Components

Components

Commands: 30
Skills: 20
Subagents: 41
Hooks: 0
MCP servers: 1
MCP tools: 25
Scripts: 2
Templates: 0

Workflow

Phases: 7
Approval gates: 2
Spec format: markdown
Spec storage: flat-files
Delta or full: whole-file

Orchestration

Multi-agent: Yes
Pattern: parallel-fan-out
Isolation: git-worktree
Consensus: quorum
Prompt chaining: Yes

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text+vision

Execution

Mode: continuous-ralph
Crash recovery: Yes
Compaction: Yes
Session handoff: Yes
Streaming: Yes

Memory

Type: hybrid
Persistence: global
Search: vector
State files: 5 files

Quality

TDD: Yes
TDD mechanism: post-hook-test-runner
Validators: 8
Self-review: adversarial-subagent

Git / Observability

Auto commit: Yes
Auto PR: Yes
Auto merge: Yes
Worktree/feat: Yes
Audit log: Yes
Audit format: jsonl
Replay: Yes

Tools

Primary: claude-code
Targets: 4
Portability: medium

Signals

Stars: 944
Last commit: 2026-05-25
Maintainer: active
Quality score: 10/10