Skip to content
/

Octomind

octomind-muvon · muvon/octomind · ★ 59 · last commit 2026-05-26

Provides a specialist AI agent runtime with tap-registry community agents, per-role multi-model routing, hard cost enforcement, and policy-as-code guardrails — all in a single Rust binary.

Best whenGeneric AI hallucinates in expert domains — the solution is packaged specialist agents (doctor:blood, lawyer:sg) delivered as one-command TOML files from a c…
Skip ifPre-loading every available tool into context (starts focused, grows only when needed), Modal approval clicks for safety (deterministic policy scripts instead)
vs seeds
claude-flow(multi-agent orchestration with MCP server management and dynamic sub-agent spawn), but Octomind is a standalone Rust bi…
Primitive shape 4 total
Subagents 1 MCP tools 3
00

Summary

Octomind — Summary

Octomind is a Rust-compiled single-binary CLI (octomind) that provides a specialist AI agent runtime: instead of prompting a generic AI, users run domain-specific agents (e.g., octomind run doctor:blood, octomind run lawyer:sg, octomind run developer) from a community tap registry — each agent ships with its own model config, system prompt, MCP servers, tool permissions, and specialist knowledge. The framework has five architectural pillars: zero-config tap-based agent discovery, adaptive context compaction (cache-aware, pressure-tiered), per-role and per-step model selection with hard spending thresholds, policy-as-code guardrails (pre-call deny, post-result hooks), and intent-driven capability auto-activation. Multi-model routing is deep: different roles within a single session can run on completely different providers (Anthropic, OpenAI, Google, DeepSeek) with per-role temperature, MCP servers, and tool permissions — all in TOML, no framework code. Agents can spawn sub-agents, enable/disable MCP servers dynamically, and be embedded via ACP (agent-to-agent protocol) or WebSocket. The closest seed analog is claude-flow (multi-agent orchestration with MCP integration), but Octomind's specialist domain focus, TOML-driven role system, and hard cost enforcement are architecturally distinct.

01

Overview

Octomind — Overview

Origin

Created by Muvon Un Limited (Apache-2.0 license, 59 stars, Rust, last commit 2026-05-26 — actively developed day of analysis). Version 0.29.0. The INSTRUCTIONS.md reads like a production architecture document, suggesting a team environment.

Philosophy

"Install agents, not frameworks." "Open-source runtime for specialist AI agents. One command. Any model. Any domain."

The core argument against generic AI:

"Generic AI hallucinates in expert domains. ChatGPT writes wrong drug dosages. Lawyers cite cases that don't exist. Multi-agent specialization is now the default architecture for serious work."

The five pillars:

  1. Zero config, full flexibility: octomind run lawyer:sg works out of the box — full customization in TOML, no framework code
  2. Sessions stay sharp at hour 4: Adaptive compaction preserves decisions, file references, errors; drops noise
  3. Cost as a control plane: Per-step model selection + hard spending thresholds enforce before the bill, not after
  4. Guardrails: policy as code: Deterministic scripts for pre-call guards, post-result hooks, post-turn validators — fits CI
  5. Intent-driven context: Skills and capabilities activate only when what you're asking for semantically matches them

Anti-patterns Targeted

"Cursor users get $7,000 surprise bills. Octomind agents trip a budget and stop, fall back, or warn — before the bill, not after." "Most agent harnesses pre-load every available tool into context. Octomind starts focused for the domain and grows only when needed."

Community Tap Registry

The most distinctive design element: specialist agents are published as TOML files in Git repos (taps). The community publishes domain expertise as doctor:medications, lawyer:us, security:owasp — one file, immediately installable. This is a package manager for specialist agents.

02

Architecture

Octomind — Architecture

Distribution

  • Type: standalone-repo (Rust binary, distributed via install.sh + cargo)
  • Binary: octomind (single static binary)
  • Version: 0.29.0
  • Language: Rust 1.95+, tokio async

Install Methods

# One-line install
curl -fsSL https://raw.githubusercontent.com/muvon/octomind/master/install.sh | bash
# Detects OS + architecture, installs to ~/.local/bin/

# Cargo
cargo install octomind

# Build from source
git clone https://github.com/muvon/octomind.git && cargo build --release

Required Runtime

  • Rust 1.95+ (build only)
  • No runtime dependencies — single statically-linked binary

Configuration Location

~/.local/share/octomind/config/config.toml (user-global) ./INSTRUCTIONS.md — custom instructions loaded as user message (configurable filename) ./CONSTRAINTS.md — appended to each user request (configurable filename) .agents/guardrails.toml — project-level policy rules

Source Structure

src/
├── main.rs              # Entry: CLI parsing → Config::load() → dispatch
├── config/              # Config types, loading, migrations
├── mcp/                 # Tool routing, server lifecycle, builtin tools
│   ├── core/            # plan, tap, capability, skill, dynamic servers
│   └── runtime/         # agent, skill, schedule, capability tools
├── session/
│   ├── chat/session/    # Main loop, command dispatch, API calls
│   ├── layers/          # AI sub-agent execution
│   ├── pipelines/       # Deterministic script pipelines
│   ├── workflows/       # AI-orchestrated multi-step workflows
│   └── learning/        # Cross-session lesson extraction/injection
├── acp/                 # ACP stdio server (agent-to-agent protocol)
├── websocket/           # WebSocket server for remote sessions
├── agent/               # Tap registry, manifest resolution
└── commands/            # CLI subcommand handlers
config-templates/
├── default.toml         # ALL config fields with defaults
└── agents/developer     # Developer agent template

Config Architecture

All behavior derives from a single resolved TOML config:

  • Arrays (roles, workflows) are concatenated + deduped by name
  • Tables are deep-merged
  • mcp-*.toml files always load after mcp.toml (override last)
  • Per-role config: get_merged_config_for_role(role) → collects explicit server_refs UNION auto_bind matches

API Keys (env vars)

OPENROUTER_API_KEY   # OpenRouter (recommended — multi-provider)
OPENAI_API_KEY       # OpenAI direct
ANTHROPIC_API_KEY    # Anthropic direct
DEEPSEEK_API_KEY     # DeepSeek direct
GOOGLE_APPLICATION_CREDENTIALS  # Google Cloud
AWS_ACCESS_KEY_ID    # Amazon Bedrock
CLOUDFLARE_API_TOKEN # Cloudflare Workers AI

Target Tools

Any LLM provider accessible via TOML config. Primary: Anthropic (Claude), OpenAI (GPT), Google (Gemini), DeepSeek, OpenRouter. Not a Claude Code plugin — standalone runtime.

03

Components

Octomind — Components

CLI Subcommands (10)

Command Purpose
octomind run [role] Start interactive or non-interactive session
octomind run --daemon Background daemon mode
octomind acp [role] ACP stdio server for multi-agent orchestration
octomind send <message> Send message to a running daemon
octomind tap <source> Install a tap registry
octomind untap <name> Remove a tap registry
octomind config [--show|--validate] View or validate config
octomind server WebSocket server mode
octomind complete Shell completion
octomind vars Show config variables

Session Commands (interactive, slash-prefixed)

Command Purpose
/help Show all commands
/info Token usage and costs
/model <provider:model> Switch model mid-session
/effort <level> Set reasoning effort (low/medium/high/xhigh/max)
/role <name> Switch role mid-session
/session Manage saved sessions (auto-saved)
/exit Exit session

Roles (TOML-defined)

Default roles from config-templates/default.toml:

Role Model Use
assistant openrouter:claude-sonnet-4 Default interactive
task_refiner openrouter:gpt-4.1-mini Task refinement
task_researcher openrouter:gemini-2.5-flash-preview Research (broad context)
reduce openrouter:o4-mini Summarization/reduction

Tap Agents (muvon/octomind-tap registry)

Community specialists accessed via octomind run <tap>:<role>:

  • octomind:assistant — general assistant
  • doctor:blood — blood-test interpretation
  • doctor:nutrition — nutrition specialist
  • lawyer:sg — Singapore law
  • developer — general development
  • developer:general — general dev with language skill auto-activation

Layers (AI sub-agents)

Defined as [[layers]] in TOML — chained AI sub-agents that run after each response:

  • Each layer is a role (model + tools + prompt)
  • Example: context_researcher (Gemini flash) → senior_reviewer (Claude Opus)

Pipelines (deterministic)

Script-driven pre-processing stages defined as [[pipelines]] in config — run before the AI receives the message.

Workflows (AI-orchestrated multi-step)

Defined as [[workflows]] in TOML — multi-step task runners with validation loops:

[[workflows]]
name = "deep_review"
[[workflows.steps]]
name = "analyze"
layer = "context_researcher"
[[workflows.steps]]
name = "critique"
layer = "senior_reviewer"

Guardrails (.agents/guardrails.toml)

Policy-as-code rules applied deterministically:

[[guard]]
match = "shell(command=^rm\\s+-rf?)"
message = "rm -rf blocked."

[[guard]]
match = "shell(command=git push)"
when = ["+shell(command=git status)"]
message = "Review changes before pushing."

[[hook]]
match = "text_editor(path=src/.*\\.rs)"
on = "success"
script = ".agents/check-clippy.sh"

Learning System

Cross-session lesson extraction/injection (src/learning/) — the agent can extract lessons from completed sessions and inject them into future sessions for continuous improvement.

Built-in MCP Tools

tap — delegate to any specialist from registry
mcp — enable/disable MCP servers dynamically mid-session
agent — spawn a specialist sub-agent for a sub-task

05

Prompts

Octomind — Prompts

INSTRUCTIONS.md (Custom Instructions File)

# Octomind — AI Development Assistant (Rust)

Session-based AI assistant where the model calls MCP tools (read/write files, search, 
shell, delegate) to do real work. Sessions run interactively (CLI), non-interactively 
(`--format`), or as daemons (ACP/WebSocket). Config is the single source of truth — 
all runtime behavior (model, tools, roles, compression, learning) derives from TOML. 
Multi-provider via `octolib`. Rust 1.95+, tokio async, `clap` CLI.

Prompting technique: Declarative capability description with implementation facts. This is the file the AI loads as its first user message — it's a technical briefing document rather than a persona or instruction set.


Guardrails TOML (policy-as-code prompt injection)

# Pre-call deny — block before execution
[[guard]]
match   = "shell(command=^rm\\s+-rf?)"
message = "rm -rf blocked."

# Conditional rule — only fires when condition met this session
[[guard]]
match   = "shell(command=git push)"
when    = ["+shell(command=git status)"]
message = "Review changes before pushing."

# Post-result hook — non-zero exit injects feedback into agent's inbox
[[hook]]
match  = "text_editor(path=src/.*\\.rs)"
on     = "success"
script = ".agents/check-clippy.sh"

Prompting technique: Deterministic policy injection — not a prompt to the AI, but a policy evaluated by the runtime before/after tool calls. The message field is what gets injected into the agent's context when the guard fires. This is "negative prompting at the infrastructure level" — the runtime enforces constraints before the AI even sees the tool result.


Default Role Config (config-templates/default.toml)

[[roles]]
name = "assistant"
temperature = 0.3
top_p = 0.7
top_k = 20
# system prompt loaded from tap agent manifest

[[roles]]
name = "task_refiner"
model = "openrouter:openai/gpt-4.1-mini"
temperature = 0.3

[[roles]]
name = "task_researcher"
model = "openrouter:google/gemini-2.5-flash-preview"
temperature = 0.3

[[roles]]
name = "reduce"
model = "openrouter:openai/o4-mini"
temperature = 0.3

Prompting technique: Model-role separation in TOML — each role gets its own model, temperature, and tools. The system prompt comes from the tap agent manifest, not from this config. This is "configuration as prompt engineering" — model selection and parameters are the primary knobs.

09

Uniqueness

Octomind — Uniqueness

differs_from_seeds

The closest seed is claude-flow in that both implement multi-agent orchestration with dynamic sub-agent spawn and MCP server management. Key differences: Octomind is a standalone Rust binary (not a Node.js npm package); it implements a tap registry (community-published specialist agents as TOML files) that has no parallel in any seed; its cost-control plane (hard per-request and per-session USD thresholds) is unique; and its adaptive cache-aware compaction engine is architecturally distinct from claude-flow's HNSW vector store. Unlike taskmaster-ai (task JSON decomposition for a single model), Octomind routes different roles to different providers within a single session and supports per-step model selection in workflows. The sandbox mode (Landlock/Seatbelt) and guardrails.toml policy-as-code approach have no analog in any of the 11 seeds.

Distinctive Positioning

  1. Tap registry ecosystem: The only framework in the corpus with a community package manager for specialist agents. Publishing expertise as a TOML file and making it immediately available to all users is a novel distribution model.
  2. Hard cost enforcement: Per-request and per-session USD thresholds that actually stop execution — not advisory warnings. The "$7K Cursor bill" comparison is a marketing claim backed by a real enforcement mechanism.
  3. Cache-aware adaptive compaction: Calculates whether compaction is worth the cost before doing it, and never breaks the prompt-cache hit in the process. Structurally preserving (keeps decisions, drops noise). Zero user interaction.
  4. ACP protocol: First framework in this batch that implements a formalized agent-to-agent communication protocol (ACP stdio) — Octomind can be called as a sub-agent by external orchestrators.
  5. OS-native sandbox: Rust + Landlock (Linux) / Seatbelt (macOS) sandbox mode restricts filesystem writes at the OS level — not just advisory.
  6. Rust implementation: Single static binary, no Node.js/Python runtime needed, fast startup, memory-safe.

Observable Failure Modes

  • Low star count (59) for its ambition: The feature set is enterprise-grade but community adoption is nascent; tap ecosystem is thin
  • TOML config complexity: Per-role, per-layer, per-workflow, per-pipeline config can become difficult to manage at scale
  • ACP compatibility: ACP is not an established standard — only Octomind currently implements both sides
  • Learning system opacity: Cross-session lesson extraction/injection is not documented in detail — behavior may be unpredictable
  • Provider-specific compaction: Cache keepalive only works for Anthropic (1h TTL) — other providers silently skip pings
  • Tap ecosystem bootstrapping: The community tap registry is new; specialist agents for obscure domains may not yet exist
04

Workflow

Octomind — Workflow

Session Lifecycle

Phase Command Artifact
1. Install tap octomind tap muvon/octomind-tap Tap TOML manifests
2. Start session octomind run developer Interactive session
3. Specialist dispatch Agent uses tap tool Sub-agent spawned
4. Dynamic MCP Agent uses mcp tool MCP server enabled
5. Cost check Automatic per request Warning/stop if threshold exceeded
6. Compaction Automatic at pressure Context preserved, noise dropped
7. Session save Automatic Session state persisted
8. Learning extraction End of session Lessons injected into future sessions

Session Initialization (5 entry points, CRITICAL INVARIANT)

All 5 entry points must call init_session_services(&role):

  1. Interactive + non-interactive CLI (src/session/chat/session/main_loop.rs)
  2. ACP new_session (src/acp/agent.rs)
  3. ACP initialize (src/acp/agent.rs)
  4. WebSocket (src/websocket/server.rs)
  5. (5th mode undocumented in publicly visible code)

Approval Gates

None by default — Octomind replaces approval clicks with policy-as-code guardrails. The human is not the safety layer; the guardrails.toml rules are. Dangerous operations are blocked deterministically by pre-call guards, not by modal clicks.

Pipeline → Layer → Workflow Chain

User Message
  → Pipeline (deterministic script pre-processing)
  → AI Response
  → Layers (chained sub-agent post-processing)
  → Workflow (multi-step validation loop if configured)
  → Final Response

Non-interactive Mode

octomind run developer "Explain the auth module" --format plain
octomind run developer "List TODO items" --schema todos.json --format jsonl

Single message, exits after response. --format jsonl for CI/CD pipelines.

Daemon Mode

octomind run developer --daemon
octomind send "implement the user auth feature"

Background agent, commands sent via send subcommand.

Cost Enforcement

Per-request and per-session spending thresholds:

max_request_spending_threshold = 0.50    # USD per request
max_session_spending_threshold = 5.00    # USD per session

Agent stops/warns/falls back before exceeding — not advisory.

06

Memory Context

Octomind — Memory & Context

State Storage

Session State (auto-saved)

  • Sessions auto-save after each exchange
  • /session command for browsing, restoring, and managing sessions
  • Located in user data directory (~/.local/share/octomind/sessions/ or platform equivalent)

Configuration (TOML)

  • ~/.local/share/octomind/config/config.toml — user-global config
  • Roles, models, MCP servers, spending limits — all TOML
  • Migrations handled automatically on version upgrade

Learning System

  • Cross-session lesson extraction: at end of session, key decisions/insights extracted
  • Injected into future sessions for the same domain/role
  • src/learning/ module

Context Files (project-local)

  • ./INSTRUCTIONS.md — loaded as first user message per config (custom_instructions_file_name)
  • ./CONSTRAINTS.md — appended to EACH user request (custom_constraints_file_name)
  • These files serve as a project-specific memory layer

Memory Persistence

Global + project hybrid: Session state is user-global; project context is file-based in the working directory; configuration is user-global TOML.

Context Compaction (Adaptive, Pillar 2)

Octomind's most distinctive memory feature — fully automatic, cache-aware compaction:

  • Cache-aware: Calculates if compaction is worth it before paying for it (never breaks prompt-cache hit)
  • Pressure-tiered: Compacts more aggressively as context grows toward max_session_tokens_threshold
  • Structurally preserving: Keeps decisions, file references, errors, dependencies; drops noise
  • Plan-aware: Works whether session uses plan tool or has free-form chat
  • Never think about it: Zero user interaction required
max_session_tokens_threshold = 200000   # Trigger compaction at this threshold
mcp_response_tokens_threshold = 20000   # Truncate MCP responses above this

Cache Keepalive

Optional feature to prevent prompt-cache expiry during idle periods:

cache_keepalive_enabled = false   # Off by default
cache_keepalive_max_idle_seconds = 1800  # 30 min

Sends minimal ping requests (max_tokens=1) before cache TTL expires. Only for Anthropic (1h TTL, always long cache).

Cross-Session Handoff

Enabled via session restore + learning injection. The combination of saved sessions + extracted lessons means an agent can resume work on a domain and retain accumulated knowledge from previous sessions.

07

Orchestration

Octomind — Orchestration

Multi-Agent Pattern

Hierarchical with dynamic spawn — a primary agent can spawn specialist sub-agents mid-session using built-in tools:

  • tap tool: Delegate to any specialist from the tap registry (foreground for inline reply, background for long tasks)
  • agent tool: Spawn a specialist sub-agent for a sub-task; sub-agent runs, returns, parent continues
  • mcp tool: Enable/disable MCP servers on the fly (agent picks the server it needs)

Example from README:

User: "Cross-reference our Postgres metrics with the deployment log"

Agent:
  → mcp.enable(postgres-mcp)        # auto-detected need, no user prompt
  → agent.spawn(log_reader)         # delegates log parsing
  → results merge mid-session
  → mcp.disable(postgres-mcp)       # cleans up

Multi-Model Routing (Deep)

The most sophisticated multi-model system in this batch. Per-role configuration:

Role Model Provider Use
assistant claude-sonnet-4 openrouter Default
task_refiner gpt-4.1-mini openrouter Task refinement
task_researcher gemini-2.5-flash-preview openrouter Research
reduce o4-mini openrouter Summarization

Per-step model selection within workflows:

[[workflows.steps]]
name = "analyze"
layer = "context_researcher"     # gemini-flash, broad context

[[workflows.steps]]
name = "critique"
layer = "senior_reviewer"        # claude-opus, precision

Mid-session model swap: /model anthropic:claude-haiku-4-5 Per-tap model override in [taps] section. Different roles can run on different LLM vendors (not just different models on same vendor).

Execution Modes

  • octomind run — interactive CLI
  • octomind run --format jsonl — non-interactive, structured output for CI
  • octomind run --daemon + send — background agent
  • octomind server — WebSocket server for IDE/dashboard integration
  • octomind acp — ACP stdio server for multi-agent orchestration

ACP Protocol

Octomind implements ACP (Agent Communication Protocol) — agents can be called by other agents. octomind acp developer:general starts an ACP stdio server that any ACP-speaking orchestrator can use.

Policy-as-Code Guardrails

Deterministic safety — no modal clicks:

  • Pre-call guards: Block tool calls by regex match before execution
  • Conditional guards: Only fire when session state conditions are met (when clause)
  • Post-result hooks: Run scripts after tool results; inject output into agent context

Isolation

Sandbox mode (opt-in): Restricts all filesystem writes to current directory via OS-native mechanisms (Linux: Landlock kernel 5.13+; macOS: Seatbelt).

Consensus Mechanism

None — single authoritative session per role, no consensus protocols.

Crash Recovery

Session auto-save enables recovery after crash — restore the saved session to continue.

08

Ui Cli Surface

Octomind — UI & CLI Surface

CLI Binary: octomind

Single Rust binary with 10 subcommands. Full interactive session commands (17 slash-prefixed) for switching models, roles, managing sessions, and cost tracking. Non-interactive mode for CI pipelines.

octomind run developer            # Interactive session
octomind run developer --format plain  # Non-interactive, single message
octomind run developer --schema todos.json --format jsonl  # Structured output
octomind run developer --daemon   # Background daemon
octomind --version
octomind config --show
octomind config --validate
octomind config --list-themes     # List markdown themes

Markdown Rendering

All AI responses rendered with markdown formatting in the terminal. 6 themes: default, dark, light, ocean, solarized, monokai. Configurable:

enable_markdown_rendering = true
markdown_theme = "default"

WebSocket Server

octomind server — WebSocket server for IDE plugins and web dashboards. Enables remote session management from external tools.

ACP (Agent Communication Protocol)

octomind acp <role> — ACP stdio server. Enables Octomind to be called as a sub-agent by other orchestration systems. This is the "inter-agent protocol" surface — Octomind can be both an orchestrator and a sub-agent.

Real-time Cost Display

/info command shows token usage and current session cost. Threshold violations appear inline:

⚠ Session cost: $4.87 / $5.00 limit

Reasoning Effort Control

/effort high    # Set reasoning effort for thinking-capable models
# Values: low | medium | high | xhigh | max

Tap Registry Browser

octomind tap muvon/octomind-tap   # Install official tap
octomind tap yourteam/tap         # Install community tap
octomind run finance:analyst      # Available immediately after tap install

No Local Web Dashboard

Octomind is terminal-only for interactive use. The WebSocket server enables remote connection for tool authors to build dashboards, but no built-in web UI is provided.

Observability

  • Real-time cost tracking: cache_read_tokens, cache_write_tokens separated from input/output
  • /info for per-request and per-session cost breakdown
  • Structured JSONL output for CI/CD pipelines (--format jsonl)
  • log_level = "none|info|debug" config option

Related frameworks

same archetype · same primary tool · same memory type

CodeMachine CLI ★ 2.5k

JavaScript-DSL workflow orchestration engine that captures repeatable AI coding agent workflows with tracks, condition groups,…

Codexia ★ 690

Tauri desktop app providing visual control plane, task scheduler, git worktree manager, and headless REST API for Codex CLI +…

Kagan ★ 88

Kanban TUI for AI coding agents with a structurally enforced human review gate (REVIEW → DONE cannot be automated) — one git…

oh-my-claudecode (Yeachan-Heo) ★ 35k

Zero-learning-curve teams-first multi-agent orchestration for Claude Code with autopilot (6-phase lifecycle), ralph (PRD-driven…

Paseo ★ 6.8k

Multi-provider AI coding agent orchestration daemon with cross-device access (phone/desktop/CLI) and git worktree isolation.

CCG Workflow ★ 5.4k

Routes Claude + Codex + Gemini to task-appropriate collaboration strategies (direct-fix through full-collaborate) with hook-based…