Skip to content
/

DocBrain

docbrain · last commit 2026-05-25

Primitive shape
No installable primitives
00

Summary

DocBrain — Summary

DocBrain is a shift-left documentation platform that intercepts knowledge at the moment of creation (PR merges, Slack threads, CI deployments, IDE sessions) and automatically transforms it into quality-scored, reviewable, published documentation — exposing a 10-tool MCP server for IDE integration with Claude Code and Cursor.

Problem it solves: Documentation written after the fact is written from memory, under competing priorities, and decays immediately; DocBrain captures the knowledge that was never written down (PR decisions, Slack explanations, deployment gotchas, incident resolutions) at the point of creation, applies 3-layer quality scoring, and routes it through configurable review workflows before publishing.

Distinctive trait: The confidence-based routing pipeline (auto-index high confidence, queue low for review, discard noise) combined with DBSCAN clustering that groups similar knowledge fragments into composed documents — approaching documentation generation as an ML pipeline rather than a writing task.

Target audience: Engineering organizations with knowledge scattered across PRs, Slack, CI pipelines, and IDE sessions who want documentation that "gets better as your team works, not worse," backed by RBAC, SSO, SLA enforcement, and enterprise audit logs.

Production-readiness: Pre-source release — Docker images and Helm charts are public but source code not yet open (targeting H1 2026, BSL 1.1 license); Docker + Helm available now; Rust backend for sub-100ms API responses; 21 stars, 3 contributors.

Differs from seeds: DocBrain is unlike all 11 seed frameworks — it is not an AI coding agent methodology but a knowledge management platform with an MCP interface. The closest comparison is ccmemory (knowledge persistence with MCP tools), but DocBrain operates at organizational scale (multi-source ingestion, governance dashboards, team RBAC, SLA policies) rather than per-developer session memory. The 10 MCP tools are knowledge query/capture endpoints, not coding agent primitives.

01

Overview

DocBrain — Origin and Philosophy

Origin

DocBrain is maintained by the docbrain-ai organization (website: docbrainapi.com). The primary maintainer is abhipsnl. Built in Rust (backend), with the web UI and API consuming the Rust services. Currently in pre-source release — Docker images and configuration are public but source code is not yet open.

From MAINTAINERS.md and CODE_OF_CONDUCT.md, DocBrain has governance processes typical of professionally-maintained open infrastructure.

Philosophy (from README)

"DocBrain doesn't wait for someone to write a doc. It intercepts knowledge at the point of creation and turns it into documentation automatically. We call this shift-left documentation, the same principle that made shift-left testing work. Move the capture upstream, to where the knowledge actually exists."

On the root cause:

"The root cause isn't laziness. It's timing. Documentation written after the work is done is documentation written from memory, without context, under competing priorities. It's a tax that nobody wants to pay, and when they do pay it, the result decays immediately."

On differentiation:

"Every tool in the market solves the wrong problem. They index your existing docs and build a chatbot on top. Now you have a chatbot that surfaces your stale, incomplete, scattered documentation slightly faster. The actual problem is that the knowledge was never captured in the first place."

"An IDE asks your tools a question. DocBrain turns every question your org has ever asked into a system that gets smarter. The 100th person asking about Kubernetes gets a better answer because of the first 99."

Knowledge pipeline

Capture → Quality Score → Cluster & Compose → Review & Publish
  1. Capture: Multi-source ingestion (13+ connectors: Confluence, Slack, Teams, GitHub, GitLab, Jira, PagerDuty, etc.)
  2. Score: 3-layer scoring (structural deterministic + style rules + semantic LLM)
  3. Cluster: DBSCAN clustering groups similar fragments into composed documents
  4. Publish: Multi-stage review workflow → publish to wiki/Confluence/etc.
02

Architecture

DocBrain — Architecture

Distribution

  • Type: Docker image + Helm charts (production); standalone-repo (self-hosted)
  • CLI: docbrain (via Docker)
  • Source: https://github.com/docbrain-ai/docbrain (config + docs public; source code not yet open)
  • Version analyzed: pre-source release (2026-05-25)
  • License: BSL 1.1 (Business Source License)

Install

# Quick start
git clone https://github.com/docbrain-ai/docbrain.git && cd docbrain
./scripts/setup.sh    # Interactive wizard: picks provider, sets keys, starts services

# Manual
cp .env.example .env
docker compose up -d

# Get admin key
docker compose exec server cat /app/admin-bootstrap-key.txt

# Open dashboard
open http://localhost:3001

Required runtime

  • Docker
  • PostgreSQL (via docker-compose)
  • Redis (via docker-compose)
  • OpenSearch (via docker-compose)
  • LLM API key (Anthropic, OpenAI, AWS Bedrock, Ollama, or Gemini)

Target AI tools

MCP-compatible IDE tools: Claude Code, Cursor, any MCP-compatible editor

Supported LLM providers (14)

Anthropic, OpenAI, AWS Bedrock, Ollama (fully local), Gemini, plus 9 more. Swappable without code changes.

Directory structure

docbrain/
├── config/
│   ├── default.yaml        # All configuration with defaults
│   ├── development.yaml    # Dev overrides
│   ├── production.yaml     # Production overrides
│   └── mcp-manifests/      # MCP server manifests (Confluence, Jira, Slack, etc.)
│       ├── docbrain.json   # Main DocBrain MCP manifest (10 tools)
│       ├── confluence-rest.yaml
│       ├── jira-rest.yaml
│       ├── slack-rest.yaml
│       └── ...
├── docker-compose.yml      # Full stack
├── Dockerfile
├── helm/                   # Kubernetes deployment charts
├── docs/                   # Full documentation
├── examples/               # Style rules examples
│   └── style/
│       └── .docbrain/style.md  # GitOps style policy
└── scripts/
    └── setup.sh            # Interactive install wizard

Port

  • API server: 3000
  • Web dashboard: 3001
03

Components

DocBrain — Components

MCP tools (10)

Exposed via the docbrain.json MCP manifest:

Tool Purpose
docbrain_ask Ask a question across all connected knowledge sources; returns answer with source attribution
docbrain_incident Query documentation in incident context; returns runbooks, past incidents, troubleshooting guides
docbrain_freshness Check documentation freshness/staleness based on code changes and time elapsed
docbrain_autopilot_gaps Analyze documentation for gaps — undocumented or insufficiently documented areas
docbrain_autopilot_generate Generate missing documentation for identified gaps
docbrain_autopilot_summary Get summary of Autopilot activity: generated docs, gaps, freshness alerts, health metrics
docbrain_annotate Link knowledge to exact code locations in IDE
docbrain_suggest_capture Suggest knowledge worth capturing from current context
docbrain_commit_capture Capture knowledge intent at commit time
docbrain_capture Manual fragment submission

Knowledge capture points (13+ connectors)

Source How it works
Merged PRs POST /api/v1/ci/analyze — LLM extracts decisions and procedures from diffs
Deployments POST /api/v1/ci/deploy-capture — captures deployment context + rollback procedures
Slack Thread capture via message shortcut, @DocBrain mention, or /docbrain capture
Teams Same as Slack
IDE (MCP) docbrain_annotate, docbrain_commit_capture — links knowledge to code
Conversations Auto-distillation from Q&A sessions
GitHub PR analysis, issue linking
GitLab Same as GitHub
Confluence Full bidirectional sync
Notion Ingestion
Jira Issue security levels + knowledge linking
PagerDuty Incident resolution capture
Manual POST /api/v1/fragments or docbrain capture CLI

Quality pipeline

Layer Method What it measures
Structural Deterministic (no LLM) Heading structure, code examples, link density, readability
Style Rule engine (YAML rules) Banned terms, sentence length, required sections, custom regex
Semantic LLM-assessed Accuracy, clarity, completeness, actionability

Composite score: structural × 0.4 + style × 0.3 + semantic × 0.3

Confidence routing:

  • > 0.7 → auto-index (available for Q&A immediately)
  • 0.4–0.7 → queued for human review
  • < 0.4 → discarded as noise

API surface

  • 150+ REST endpoints — full OpenAPI 3.1 spec; Swagger UI at /api/docs
  • SSE streaming — real-time event bus
  • Outbound webhooks — HMAC-SHA256, exponential backoff, circuit breakers

Governance features

  • RBAC (4-tier: viewer/editor/analyst/admin) + GitHub/GitLab/OIDC SSO
  • SLA policies: gap acknowledgment (24h), resolution (7d), draft review (48h), freshness
  • Audit log (HIPAA/FedRAMP/SOC2 compliance contexts)
  • Source ACL: per-source permission mirroring (Confluence page restrictions, Slack channel membership, GitHub repo visibility)
05

Prompts

DocBrain — Prompt Files (Verbatim Excerpts)

Note on source availability

DocBrain is in pre-source release. Source code (including LLM prompt templates) is not publicly available. The following is based on the API documentation and README descriptions of the system's behavior.

Excerpt 1: Style rule definition format (examples/style/.docbrain/style.md)

Technique: Declarative rule DSL for documentation quality enforcement

# Export your rules as YAML, version-control them, import across spaces

- rule_type: terminology
  name: no-simple
  description: "Don't assume expertise. Avoid 'simple' and 'easy'"
  config:
    wrong: "simple"
    right: "straightforward"
    match_whole_word: true
  severity: warning

- rule_type: formatting
  name: short-sentences
  description: "Keep sentences under 40 words for readability"
  config:
    max_words: 40
  severity: info

- rule_type: structure
  name: require-intro
  description: "Every doc needs an introduction before the first heading"
  config:
    min_words_before_first_heading: 10
  severity: warning

- rule_type: custom_pattern
  name: no-internal-urls
  description: "Don't leak internal URLs in public docs"
  config:
    pattern: "https?://internal\\."
    message: "Remove internal URL before publishing"
  severity: error

Excerpt 2: MCP tool descriptions (from config/glama.json)

Technique: RAG-backed knowledge query with source attribution

{
  "name": "docbrain_ask",
  "description": "Ask a question against your organization's documentation. Returns
    an answer with source attribution across all connected knowledge sources
    (Confluence, Notion, GitHub, Slack, etc.)."
},
{
  "name": "docbrain_incident",
  "description": "Query documentation in the context of an ongoing incident. Returns
    relevant runbooks, past incident reports, and troubleshooting guides with
    source attribution."
},
{
  "name": "docbrain_autopilot_gaps",
  "description": "Analyze your documentation for gaps — areas that are undocumented
    or insufficiently documented based on your codebase and recent activity."
}

Prompting techniques observed

  1. Declarative quality DSL — style rules are YAML-serialized quality policies, not prose instructions; they version-control like code and import/export via API
  2. Confidence-scored routing — knowledge isn't binary (good/bad) but probabilistic (auto-index/review/discard); the routing is implicit in the quality score
  3. RAG with source attribution — all answers include citations to source fragments, enabling verification and preventing hallucination propagation
  4. GitOps style policy — treating documentation style as code (PR-reviewed YAML files) rather than a wiki page that anyone can change
09

Uniqueness

DocBrain — Uniqueness & Differentiation

Primary differentiator: shift-left documentation

Every other tool in the knowledge management space indexes existing documents and builds a chatbot on top. DocBrain's thesis is that the root cause of poor documentation is timing: knowledge is lost between the moment of creation and the moment someone writes it down.

DocBrain captures knowledge at the point of creation — in a PR diff, at a deployment event, in a Slack thread, at commit time — before the engineer moves on. The README frames this as "shift-left documentation," drawing an explicit parallel to shift-left testing.

Confidence-scored routing (not binary quality gates)

Most document systems classify content as "good" or "bad" after the fact. DocBrain scores every fragment on a continuous 0–1 scale (structural × 0.4 + style × 0.3 + semantic × 0.3) and routes automatically:

  • High confidence → no human required, indexes immediately
  • Medium confidence → human review queue
  • Low confidence → discarded as noise

This is a probabilistic pipeline, not a boolean gate. The LLM is one input among three, not the sole arbiter.

Declarative style-as-code

Quality rules are YAML-serialized policies stored in .docbrain/style.md in team repositories. They version-control like code, are reviewed via PR, and import/export via API. This is structurally different from wiki-based style guides: a style rule change goes through code review, has a blame history, and is enforced mechanically — not by convention.

Organizational memory accumulation

The platform explicitly models the cumulative value of repeated queries: "the 100th person asking about Kubernetes gets a better answer because of the first 99." Conversation distillation is a documented capture path. This is a stated design goal, not an incidental side-effect.

Source permission mirroring

When a user queries DocBrain via MCP, they receive only the fragments they are authorized to see on the source system. A Confluence page restriction is enforced at retrieval time, not just at the Confluence access layer. Three enforcement modes (off/warn/enforce) give administrators gradual rollout capability.

Comparison to seed frameworks

Dimension DocBrain Closest seed (ccmemory)
Scope Org-wide knowledge platform Single developer context store
Capture sources 13+ connectors (CI, Slack, IDE, etc.) Claude Code projects only
Quality pipeline 3-layer scoring + DBSCAN clustering None
Human review Configurable multi-stage workflow None
Target user Engineering org (multi-team) Individual developer
Deployment Self-hosted Docker + Helm Local filesystem

DocBrain is not a coding methodology or prompt framework — it is infrastructure for organizational knowledge that coding agents consume via MCP. The closest archetype is MCP-anchored toolserver, but at an enterprise scale that the seed frameworks do not attempt.

Limitations / caveats

  • Source code is pre-release (BSL 1.1); community contributions are not currently possible
  • Rust backend; extending the platform requires waiting for open-source release
  • Heavy runtime requirements (PostgreSQL + Redis + OpenSearch + LLM API); not a lightweight local tool
  • Primary value is realized when 13+ connectors are integrated; single-source deployments underuse the platform
04

Workflow

DocBrain — Workflow

Knowledge lifecycle

Capture → Quality Score → Cluster & Compose → Review Workflow → Publish
Phase What happens Artifact
Capture Multi-source ingestion (PR, Slack, CI, IDE, manual) triggers fragment creation Knowledge fragment
Quality Score 3-layer scoring: structural + style + semantic Composite quality score
Confidence Routing >0.7 → auto-index; 0.4-0.7 → review queue; <0.4 → discard Routing decision
Clustering DBSCAN groups similar fragments into document clusters Fragment clusters
Composition Auto-compose cluster into full document draft Document draft
Review Multi-stage configurable review pipeline (SME → Writer → Publish Approval) Review verdict
Publish Approved document published to connected wiki (Confluence, etc.) Published document

Review workflow structure

Configurable per space:

  1. SME Review — subject matter expert approves accuracy
  2. Technical Writing Review — writer approves style and structure
  3. Publish Approval — final sign-off before publishing

Each stage: approve / request changes / reject, with threaded comments.

SLA policies

Per-space configurable SLAs:

  • Gap acknowledgment: 24 hours
  • Gap resolution: 7 days
  • Draft review: 48 hours
  • Document freshness: configurable

GitOps style policy

.docbrain/style.md in team repos — DocBrain pulls on schedule and applies custom style rules to every draft for that space. Style changes go through normal PR review.

Approval gates

Knowledge captured → quality scored → if low confidence → human review queue → approval → index/publish

06

Memory Context

DocBrain — Memory & Context

State storage stack

Layer Technology Purpose
Primary store PostgreSQL Document metadata, fragments, review states, audit logs, RBAC assignments
Search / Vector OpenSearch BM25 full-text + vector semantic search; hybrid retrieval
Cache Redis API response caching, session state, real-time event queuing

All three services launch via docker-compose.yml alongside the Rust API server.

RAG pipeline configuration

From config/default.yaml (documented parameters):

rag:
  top_k: 10                  # Number of source fragments returned per query
  bm25_boost: 1.0            # BM25 score weight relative to vector score
  candidate_pool_size: 200   # Candidates fetched before RRF re-ranking
  rrf_k: 60                  # Reciprocal Rank Fusion constant

Retrieval is hybrid: BM25 keyword recall + dense vector semantic recall, merged via Reciprocal Rank Fusion (RRF). Fragments are chunked at ingest and stored in OpenSearch with embeddings generated at capture time.

Knowledge accumulation model

DocBrain maintains organizational memory across three dimensions:

  1. Fragment store: raw knowledge units captured from 13+ sources (Slack threads, PR diffs, deployment events, manual submissions). Each fragment carries source, timestamp, quality score, and cluster assignment.

  2. Composed documents: DBSCAN-clustered fragments are auto-composed into full documents; published documents written back to Confluence or other sinks. Both the fragment and composed document are indexed for retrieval.

  3. Interaction history: Q&A sessions via docbrain_ask contribute back to the knowledge graph — "the 100th person asking about Kubernetes gets a better answer because of the first 99." Conversation-level distillation is a documented capture path.

Cross-session persistence

DocBrain is a server-side platform; sessions are stateless from the agent/IDE perspective. All context lives server-side in PostgreSQL and OpenSearch. An MCP client (Claude Code, Cursor) issues a docbrain_ask call; the server assembles fresh context from indexed fragments on every query.

There is no client-side memory file or session state that agents manage.

Source ACL enforcement

Source permissions are mirrored at retrieval time. The platform supports three enforcement modes:

  • off — no ACL check; all fragments returned
  • warn — ACL violations logged but fragments included
  • enforce — fragments excluded if requester lacks permission on source system

This means a user querying via MCP gets only the fragments they would have access to in Confluence, Slack, or GitHub — not a merged flat view.

Freshness tracking

docbrain_freshness monitors staleness as a function of code changes + elapsed time. Freshness alerts surface in docbrain_autopilot_summary. SLA policy: configurable per space, default doc freshness threshold is not specified (administrator-set).

Confidence-based indexing threshold

Fragments are only added to the retrieval index if quality score exceeds 0.7. Below that threshold, fragments enter a human review queue (0.4–0.7) or are discarded (<0.4). This prevents low-quality noise from degrading retrieval precision.

07

Orchestration

DocBrain — Orchestration

Orchestration model

DocBrain is not a multi-agent orchestration framework. It is a server-side knowledge platform that AI coding agents query. The orchestration model is event-driven ingest + pipeline processing on the server, and MCP tool calls from the client (IDE/agent) side.

There are no agent-to-agent handoffs, no task queues delegated to sub-agents, and no prompt chaining that agents must manage. The complexity lives inside the DocBrain server.

Server-side pipeline

Capture event → Quality Score → Confidence Route → Cluster (DBSCAN) → Compose → Review → Publish

This pipeline runs autonomously on the server whenever a capture event fires:

  1. Ingest trigger: CI webhook (POST /api/v1/ci/analyze), Slack shortcut, IDE tool call (docbrain_capture), or scheduled connector poll
  2. Quality scoring: 3-layer (structural + style rules + LLM semantic); composite score computed
  3. Routing: > 0.7 → auto-index; 0.4–0.7 → review queue; < 0.4 → discard
  4. Clustering: DBSCAN groups related fragments; cluster composition fires when cluster stabilizes
  5. Review workflow: configurable multi-stage (SME → Writer → Publish Approval); humans approve/reject via dashboard or API
  6. Publish: approved document written to Confluence or other connected sink

Client-side (agent) interaction

From an AI coding agent's perspective, DocBrain exposes 10 MCP tools. The agent orchestrates nothing — it issues one tool call and receives an answer:

IDE/Agent → MCP call → DocBrain server → RAG retrieval → answer + citations

The Autopilot tools (docbrain_autopilot_gaps, docbrain_autopilot_generate) allow an agent to trigger a gap analysis and generation run, but execution happens server-side.

Event bus

DocBrain exposes a real-time SSE (Server-Sent Events) stream. Agents or external systems can subscribe to capture events, quality score results, review state changes, and publish confirmations. This allows external orchestrators to react to DocBrain lifecycle events without polling.

Outbound webhooks

HMAC-SHA256 signed webhooks notify external systems on document lifecycle events. Delivery includes exponential backoff and circuit breakers to handle downstream unavailability.

Governance enforcement points

Two pipeline stages act as control gates:

  • Confidence routing: low-quality fragments never enter the index without human sign-off
  • Review workflow: multi-stage human approval before publication; each stage has SLA timers (48h draft review default)

No agent-to-agent communication

DocBrain does not spawn sub-agents, does not use an orchestrator/worker pattern, and does not manage agent context windows. It is queried as a tool, not coordinated as an agent.

08

Ui Cli Surface

DocBrain — UI & CLI Surface

Web dashboard

  • URL: http://localhost:3001 (configurable)
  • Purpose: review queue management, space configuration, RBAC assignment, SLA monitoring, audit log browsing, knowledge gap visualization

The dashboard is the primary human interface for the review workflow. Reviewers receive notifications (configurable: email, Slack, Teams) when drafts enter their review stage. Each stage has threaded comments and approve/request-changes/reject actions.

API server

  • URL: http://localhost:3000
  • Spec: OpenAPI 3.1 — full machine-readable spec
  • Docs: Swagger UI at /api/docs
  • Endpoints: 150+ REST endpoints
  • Streaming: SSE event bus for real-time lifecycle events
  • Webhooks: outbound HMAC-SHA256 signed, with exponential backoff + circuit breakers

CLI

Invoked via Docker:

# Interactive setup wizard
./scripts/setup.sh

# Manual Docker exec
docker compose exec server <command>

# Admin key retrieval
docker compose exec server cat /app/admin-bootstrap-key.txt

The docbrain binary is available inside the Docker container. A standalone CLI for end users is implied by references to docbrain capture in the Slack integration docs (users run /docbrain capture as a Slack slash command, and docbrain capture as a CLI command to submit fragments manually).

MCP integration (IDE surface)

DocBrain registers an MCP server via config/mcp-manifests/docbrain.json. Any MCP-compatible IDE (Claude Code, Cursor) can load this manifest and get 10 tool calls:

Tool User-facing action
docbrain_ask Ask a question; get answer + citations
docbrain_incident Incident context query for runbooks
docbrain_freshness Check if a document/area is stale
docbrain_autopilot_gaps Find undocumented areas
docbrain_autopilot_generate Generate missing docs
docbrain_autopilot_summary Health dashboard in-editor
docbrain_annotate Link a knowledge fragment to a code location
docbrain_suggest_capture Get suggestions on what to document now
docbrain_commit_capture Capture intent at commit time
docbrain_capture Manually submit a knowledge fragment

Messaging platform integrations

Platform Surface
Slack /docbrain capture slash command; @DocBrain mention; message shortcut to capture a thread
Teams Same capabilities as Slack
PagerDuty Auto-capture incident resolution

GitOps style policy surface

.docbrain/style.md placed in any team repository. DocBrain polls on schedule, parses the YAML rule definitions, and applies them to every document draft for the associated space. Style changes go through normal git PR review — no dashboard required to update quality rules.

Performance

Rust backend targets sub-100ms API response times. The dashboard is served by the same Rust process.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.