Bernstein

bernstein · sipyourdrink-ltd/bernstein · ★ 460 · last commit 2026-05-26

Govern parallel CLI coding agents with a deterministic Python scheduler, HMAC-chained audit trail, and compliance-ready signed lineage so multi-agent runs are reproducible and auditability is provable.

Best whenThe orchestrator should be plain Python code, not an LLM — deterministic scheduling is not a job for a model that produces different results on every call.

Skip ifUsing an LLM in the coordination/scheduling loop, Running agents on shared branches without worktree isolation

vs seeds

superpowers/bmad-method (prompt-behavior augmentation), Bernstein is an executable runtime daemon that orchestrates external proces…

Primitive shape 11 total

Commands 3 Skills 1 Subagents 1 MCP tools 6

Summary

Bernstein — Summary

Bernstein (sipyourdrink-ltd) is a deterministic Python scheduler that runs a crew of CLI coding agents (Claude Code, Codex, Gemini CLI, and 40+ others) against a single goal in parallel git worktrees, with an HMAC-SHA256-chained audit log over every scheduling decision per RFC 2104. Named after the conductor Leonard Bernstein, it orchestrates agents the way a conductor orchestrates an orchestra: every player on cue, the score deterministic, the conductor accountable for the result. The core innovation is that zero LLM calls appear in the coordination loop — one LLM call decomposes the goal, then plain Python decides who runs, where, with what budget; making runs fully reproducible and replayable. It ships a bernstein CLI binary (PyPI + Homebrew), an optional web dashboard at localhost:8052, 44 CLI agent adapters, signed agent cards (Ed25519/EdDSA), per-artefact lineage records, and a janitor that gates merges on tests/lint/types/cross-model review before any result lands on main.

Differs from seeds: Closest to claude-flow (both run parallel agents with worktree isolation and an audit layer) but diverges radically in the coordination plane — claude-flow uses an LLM (hive-mind) for task routing and CRDT/consensus protocols, while Bernstein uses pure Python scheduling with zero LLM in the loop. Unlike superpowers (skill-injection framework) or bmad-method (persona-based workflow), Bernstein is an executable runtime with a 4-stage pipeline (decompose → spawn → verify → merge) rather than a prompt-behavior augmentation layer. The HMAC-chained audit log, signed agent cards, per-artefact lineage, and air-gap deployment profile put Bernstein in a compliance-focused tier that no seed framework occupies.

Overview

Bernstein — Overview

Origin

Created by Alex Chernysh (GitHub: chernistry) after paying ~$400/month in Claude bills running three coding agents in parallel and getting nondeterministic merges. The chernistry/bernstein GitHub path is a redirect alias to the canonical sipyourdrink-ltd/bernstein repo. Apache 2.0, solo maintained. First published 2025.

Philosophy

Named after Leonard Bernstein, the American conductor and composer. The project orchestrates a crew of CLI coding agents the way Bernstein conducted the New York Philharmonic: every player on cue, the score deterministic, the conductor accountable for the result.

Core design principles:

Short-lived workers: agents are spawned for focused work and then exit.
File-first state: runtime state is persisted under .sdd/.
Deterministic orchestration: scheduling and lifecycle decisions are code-driven.
Verification before closure: task completion passes through janitor/quality logic.
Multi-adapter runtime: Bernstein is CLI-agent agnostic via adapter interfaces.

Manifesto Quotes

From the README:

"To achieve great things, two things are needed: a plan and not quite enough time." — Leonard Bernstein

"Most agent orchestrators use an LLM to decide who does what. That is non-deterministic and burns tokens on scheduling instead of code. Bernstein does one LLM call to break down your goal, then the rest (running agents in parallel, isolating their git branches, running tests, routing retries) is plain Python. Every run is reproducible. Every step is logged and replayable."

"I wrote Bernstein because I was paying $400/month in Claude bills running three coding agents in parallel and getting nondeterministic merges."

Positioning

Compliance-sensitive teams requiring HMAC-signed audit logs (EU AI Act Article 12, SOC 2 CC4/CC7, DORA/NIS2 mappings in docs/compliance/)
Engineering teams running ≥3 CLI coding agents in parallel
Platform teams needing a per-agent-decision audit log
Anyone burning >$1k/month on coding agents who wants deterministic replay
Forward-deployed engineers who need credentials in their env, not the client's

Explicitly not for:

Single pair-programmer use cases
Prototypes where merge gates are overkill
Non-coding tasks (research, writing, data pipelines)
SaaS-with-support-SLA requirements
"Emergent collaboration" research scenarios

Version Analyzed

v2.2.x (last commit 2026-05-26)

Architecture

Bernstein — Architecture

Distribution

Primary: pip install bernstein / pipx install bernstein / uv tool install bernstein
One-liner: curl -fsSL https://bernstein.run/install.sh | sh (macOS/Linux)
Windows: irm https://bernstein.run/install.ps1 | iex
Homebrew: brew tap chernistry/tap && brew install bernstein
Fedora/RHEL: sudo dnf copr enable alexchernysh/bernstein && sudo dnf install bernstein
npm wrapper: npx bernstein-orchestrator
Docker (GHCR): docker run --rm -v "$PWD:/work" -w /work -e ANTHROPIC_API_KEY ghcr.io/sipyourdrink-ltd/bernstein:latest run -g "..."

Required Runtime

Python 3.12+
Git (for worktree isolation)
One or more CLI agent tools installed (claude, codex, gemini, aider, etc.)

Key Config Files

bernstein.yaml — main project config (goal, max_agents, role_model_policy, quality_gates, observability, autofix)
.sdd/ — runtime state directory (task backlog, audit log, agent tokens, traces, runtime state)
.sdd/audit/YYYY-MM-DD.jsonl — HMAC-chained audit log
.sdd/config/triggers.yaml — trigger rules
bernstein-skills.toml — skill lifecycle manifest
.plugin/plugin.json — Claude Code plugin manifest (commands, agents, MCP server)
.mcp.json — MCP server config (points to uv run bernstein mcp)

Directory Tree

sipyourdrink-ltd/bernstein/
├── src/bernstein/
│   ├── adapters/          # 44 CLI agent adapters
│   ├── cli/               # CLI entrypoint
│   ├── core/
│   │   ├── agents/        # Agent lifecycle, heartbeat, spawn
│   │   ├── approval/      # Tool-call approval gate
│   │   ├── autofix/       # Autofix daemon
│   │   ├── compliance/    # EU AI Act, SOC2 mappings
│   │   ├── cost/          # Budget tracking, anomaly detection
│   │   ├── git/           # Worktree management
│   │   ├── knowledge/     # SQLite FTS5 RAG, semantic cache
│   │   ├── lineage/       # Per-artefact lineage records
│   │   ├── orchestration/ # tick_pipeline, trigger_manager
│   │   ├── quality/       # janitor.py, quality_gates.py, reviewer.py
│   │   ├── replay/        # Deterministic replay
│   │   ├── security/      # HMAC audit, agent cards, auto_approve
│   │   ├── tasks/         # Task lifecycle state machine
│   │   └── workflows/     # YAML workflow manifests (archon-inspired)
│   └── gui/
│       └── static/        # Vite bundle for web dashboard
├── agents/
│   └── orchestrator.md    # Claude Code agent definition
├── commands/              # Claude Code slash commands (run, status, stop)
├── bernstein.yaml         # Main project config (used by Bernstein to build itself)
├── bernstein-skills.toml  # Skill lifecycle manifest
├── .plugin/
│   └── plugin.json        # Claude Code plugin manifest
└── docs/                  # Full documentation tree

4-Stage Pipeline

1. Decompose  → One LLM call breaks goal into tasks with roles, owned files, signals
2. Spawn      → Agents start in isolated git worktrees, one per task
3. Verify     → Janitor checks: tests pass, files exist, lint clean, types correct
4. Merge      → Verified work lands on main; failed tasks retry or re-route

The orchestrator is a Python scheduler, not an LLM. Every scheduling decision writes a row to .sdd/audit/YYYY-MM-DD.jsonl.

Target AI Tools

Primary: Claude Code, Codex CLI, Gemini CLI
Also supported (44 adapters): Aider, Amp, Cursor, Devin Terminal, Continue, Goose, gptme, Plandex, OpenCode, Qwen, OpenHands, Open Interpreter, AIChat, Letta Code, Kilo, Kiro, Junie, AWS Q Developer, Ollama+Aider, Cody, Cloudflare Agents, IaC (Terraform/Pulumi), GitHub Copilot CLI, OpenAI Agents SDK v2, Generic --prompt wrapper
Internal scheduler LLM is configurable: internal_llm_provider: gemini|qwen|ollama|claude...

Sandbox / Isolation Backends

Default: git worktrees (one per task)
Optional: Docker (pip install 'bernstein[docker]'), E2B microVM (pip install 'bernstein[e2b]'), Modal (pip install 'bernstein[modal]')

Artifact Storage Sinks

Local (default), S3, GCS, Azure Blob, Cloudflare R2

Components

Bernstein — Components

CLI Binary: `bernstein`

Main commands (from README + CLI docs):

Command	Purpose
`bernstein init`	Creates `.sdd/` workspace + `bernstein.yaml`
`bernstein run -g "<goal>"`	One-shot multi-agent run with goal decomposition
`bernstein run plan.yaml`	Execute a YAML plan directly (skip LLM decomposition)
`bernstein run --dry-run plan.yaml`	Preview tasks and estimated cost
`bernstein live`	Watch progress in the TUI dashboard
`bernstein stop`	Graceful shutdown with drain
`bernstein gui serve`	Start web dashboard at `localhost:8052`
`bernstein gui serve --dev`	Dev mode with hot reload on `:5173`
`bernstein gui serve --minimal`	Minimal mode, no full `/api/v1/*` surface
`bernstein pr`	Auto-create GitHub PR from completed session
`bernstein from-ticket <url>`	Import Linear / GitHub Issues / Jira ticket
`bernstein autofix`	Daemon: monitors PRs, spawns fixer agent on CI failure
`bernstein hooks`	Lifecycle hooks (`pre_task`, `post_task`, `pre_merge`)
`bernstein backlog claim`	Atomically claim a task from `.sdd/runtime/task-backlog.json`
`bernstein chat serve`	Drive runs from Telegram/Discord/Slack
`bernstein workflow run <name>`	Run a YAML workflow manifest
`bernstein workflow list`	List bundled + user-installed workflows
`bernstein workflow init <name>`	Scaffold a starter manifest
`bernstein workflow validate <path>`	Validate a manifest
`bernstein schedule add/list/run`	Manage recurring schedules
`bernstein schedule audit`	Verify replayable fire receipt sequence
`bernstein integrations list --installed`	List installed agent adapters
`bernstein audit verify`	Verify HMAC chain integrity
`bernstein lineage verify <run_id>`	Verify per-artefact lineage
`bernstein cloud login/deploy/run`	Cloudflare Workers execution
`bernstein mcp`	Start MCP server mode
`bernstein --headless`	CI mode: structured JSON output, non-zero exit on failure

Stock YAML Workflows

idea-to-pr, refactor-with-tests, security-review, doc-update, dependency-bump, hot-fix

Claude Code Plugin Components

Commands (3)

run.md — Start a multi-agent orchestration run
status.md — Show current orchestration state
stop.md — Gracefully stop running orchestration

Agent (1)

agents/orchestrator.md — Claude Code agent definition. Uses MCP tools (bernstein_run, bernstein_status, bernstein_tasks, bernstein_cost, bernstein_stop, bernstein_approve) to drive orchestration.

MCP Server (1)

Exposed via bernstein mcp (stdio); wired via .mcp.json pointing to uv run bernstein mcp
Tools include: bernstein_run, bernstein_status, bernstein_tasks, bernstein_cost, bernstein_stop, bernstein_approve (confirmed from orchestrator.md)

Skills (1 template)

templates/skills/bernstein-test-runner.md — How to run tests correctly in this project

Security / Compliance Components

src/bernstein/core/security/audit.py — HMAC-SHA256 chain over .sdd/audit/*.jsonl
src/bernstein/core/security/agent_card_signer.py — JWS (RFC 7515) + JCS (RFC 8785) + Ed25519 (RFC 8037) signing
src/bernstein/core/security/auto_approve.py — Deterministic allow/deny classifier for tool calls
docs/compliance/ — EU AI Act Article 12, SOC 2 CC4/CC7, DORA/NIS2, OWASP ASI06 mappings
docs/security/lethal-trifecta.md — Capability gate for dangerous capability combinations

Core Orchestration Modules

core/orchestration/orchestrator.py — Main tick-based scheduler
core/orchestration/tick_pipeline.py — Per-tick execution pipeline
core/tasks/task_lifecycle.py — Task state machine
core/agents/agent_lifecycle.py — Per-agent process management
core/routing/router.py + cascade_router.py — Model routing (cascade fallback)
core/quality/janitor.py — Verification gate before merge
core/knowledge/rag.py — SQLite FTS5 + BM25 codebase RAG (no neural embeddings)
core/replay/ — Deterministic replay
core/workflows/ — Archon-inspired YAML manifest runner

Prompts

Bernstein — Prompts

Verbatim Excerpt 1: Orchestrator Agent (`agents/orchestrator.md`)

---
name: orchestrator
description: Decomposes goals into parallel tasks, assigns them to CLI coding agents, verifies output, and merges results. Use when a task is too large for a single agent.
---

You are the Bernstein orchestrator. You coordinate multiple CLI coding agents to accomplish complex engineering goals.

Your capabilities:
1. Decompose a high-level goal into independent tasks
2. Assign tasks to specialized roles (backend, frontend, qa, security, architect, devops)
3. Spawn agents in parallel with git worktree isolation
4. Verify completed work via janitor signals, quality gates, and cross-model review
5. Handle failures with automatic retry, cascade fallback, and task decomposition

Use the Bernstein MCP tools (bernstein_run, bernstein_status, bernstein_tasks, bernstein_cost, bernstein_stop, bernstein_approve) to drive orchestration.

Do not attempt to do all the work yourself. Delegate to Bernstein and monitor progress.

Prompting technique: Role declaration + explicit capability list + tool-use instruction + anti-pattern prohibition ("Do not attempt to do all the work yourself"). The agent is a thin MCP-tool dispatcher, not a planner — all coordination logic lives in the Python scheduler.

Verbatim Excerpt 2: Test Runner Skill (`templates/skills/bernstein-test-runner.md`)

---
name: bernstein-test-runner
description: Run project tests correctly to avoid memory leaks and failures
whenToUse: When running tests, checking test results, or verifying code correctness
---

**IMPORTANT**: Always use the project's test runner script, never run pytest directly on the full suite.

Run all tests (isolated per-file, prevents memory leaks):

```bash
uv run python scripts/run_tests.py -x

Run a single test file:

uv run pytest tests/unit/test_foo.py -x -q

NEVER run uv run pytest tests/ -x -q - this leaks 100+ GB RAM across 2000+ tests.

Check linting before completing:

uv run ruff check src/


**Prompting technique**: Explicit anti-pattern prohibition with consequence ("leaks 100+ GB RAM"), followed by positive procedural guidance. Classic Iron Law with rationalization pattern borrowed from superpowers — banning the wrong action and explaining why prevents the LLM from defaulting to a simpler-seeming alternative.

---

## Verbatim Excerpt 3: HMAC Audit Record Format (from `docs/security/audit-log.md`)

```json
{
  "timestamp": "2026-05-07T14:30:00.000000Z",
  "event_type": "task.transition",
  "actor": "orchestrator",
  "resource_type": "task",
  "resource_id": "TASK-001",
  "details": {"from_status": "open", "to_status": "claimed"},
  "prev_hmac": "0000…0000",
  "hmac": "d4e5f6…"
}

The hmac field is computed as:

payload   = prev_hmac + json.dumps(entry_without_hmac, sort_keys=True)
entry.hmac = HMAC_SHA256(audit_key, payload).hexdigest()

Technique: Structured data schema rather than a prompt — this is the output contract the scheduler enforces on every scheduling decision. Illustrates Bernstein's "receipts not prompts" philosophy: auditability is a hard constraint encoded in the runtime, not in an AI prompt.

Uniqueness

Bernstein — Uniqueness and Positioning

Differs from Seeds

Bernstein is closest to claude-flow among the seed frameworks in that both run multiple parallel agents with git worktree isolation and maintain audit state. However, the architectures diverge fundamentally: claude-flow uses an LLM hive-mind for task routing with CRDT/raft consensus protocols, while Bernstein uses zero LLM in the coordination loop — plain Python decides everything after the initial goal decomposition. This makes Bernstein deterministic and replayable in a way claude-flow cannot be.

Versus superpowers and bmad-method: Bernstein is an executable runtime daemon, not a prompt-behavior augmentation layer. It replaces the manual "dispatch a subagent" pattern with an automated scheduler. Bernstein does not inject skills into Claude Code sessions; it orchestrates external CLI agent processes from outside their context window.

Versus taskmaster-ai: Both manage task queues and support multiple AI tool backends, but Taskmaster is a project planning and tracking layer (MCP server + task JSON) that still expects a human or single agent to execute tasks. Bernstein is a full execution engine that spawns, monitors, verifies, and merges agent work.

Versus agent-os and claude-conductor: These are markdown scaffold methodologies. Bernstein is a binary with an API server, web dashboard, and 44 process adapters.

The compliance angle (HMAC audit chain, signed agent cards, per-artefact lineage, EU AI Act mapping, SOC 2 mappings) has no counterpart in any seed framework. This is the most distinctive architectural dimension — Bernstein is positioned as "the orchestrator your compliance team will sign off on."

Positioning

Bernstein fills the gap between "run one agent locally" and "buy a cloud SaaS with a credit card form." It is on-prem only by design, BYOK, Apache 2.0, and targets engineering orgs running ≥3 parallel CLI agents who need auditability, determinism, and cost control.

Observable Failure Modes

Over-decomposition: The single LLM decompose call may split a goal into too many tasks, creating merge coordination overhead that outweighs parallelism gains.
Worktree accumulation: Long-running projects accumulate stale worktrees; requires bernstein worktrees gc discipline.
Adapter drift: 44 CLI adapters means any adapter's CLI contract change can silently break that adapter.
Cost underestimation: Budget caps in bernstein.yaml are estimates; actual model costs drift with prompt size growth.
Solo-maintenance risk: As of analysis, solo maintained by one engineer; no organizational bus-factor protection.
Audit key management: If the HMAC audit key is lost or rotated incorrectly, the chain breaks and historical auditability is lost.

Workflow

Bernstein — Workflow

Standard Run Workflow (4 stages)

Phase	What happens	Artifact
1. Decompose	One LLM call breaks the goal into tasks with roles, owned files, and completion signals	`.sdd/runtime/task-backlog.json`
2. Spawn	Agents start in isolated git worktrees (one per task), credentials scoped per agent	`~/.local/share/bernstein/worktrees/<run_id>/`
3. Verify	Janitor checks concrete signals: tests pass, files exist, lint clean, types correct, cross-model review	`.sdd/audit/YYYY-MM-DD.jsonl` (append entry per check)
4. Merge	Verified work lands on main; failed tasks retry or route to different model	Merged git worktree branch

YAML Workflow Manifest (optional, declarative DAG)

# example: idea-to-pr manifest structure
steps:
  - id: plan
    type: agent
    role: architect
    ...
  - id: implement
    type: loop
    fresh_context: true   # new agent session per iteration
    until: "pnpm test"    # bash predicate exits 0
    ...
  - id: review
    type: agent
    role: reviewer
    ...

Loop nodes re-fire until a bash predicate exits 0. fresh_context: true mints a new agent session per iteration. Per-step CLI/model routing available.

Approval Gates

Gate	Type	Trigger
Tool-call approval gate	auto-approve/deny (deterministic classifier) or human queue	Before any tool call reaches operator queue
Janitor verification gate	automated	After agent completes task, before merge
Merge gate	automated or human-required	All quality gates must pass
Budget preflight	automated hard stop	Before each new attempt, checks remaining budget
Lethal-trifecta capability gate	automated deny	Specific dangerous capability combinations
Human escalation on injection detection	automated → human required	Context integrity scan detects injection patterns

Key State Files

File	Purpose
`.sdd/audit/YYYY-MM-DD.jsonl`	HMAC-SHA256-chained audit log, one record per scheduling decision
`.sdd/runtime/task-backlog.json`	Open/claimed/closed task queue
`.sdd/runtime/agent_tokens/`	Per-session zero-trust JWTs
`.sdd/traces/`	OpenTelemetry trace files
`bernstein.yaml`	Project config (goal, budget, role-model policy, quality gates)

Spec Format

None (goal is plain text). YAML for workflow manifests. JSON for task backlog.

Git Automation

Creates worktree per task (automated)
Runs tests/lint/types before merge (automated)
Creates GitHub PR (bernstein pr, semi-automated — operator invokes)
Merges verified branches to main (automated on passing quality gates)

Memory Context

Bernstein — Memory and Context

State Storage

Primary mechanism: File-based state under .sdd/ directory. Bernstein explicitly bans databases in its own bernstein.yaml constraint: "File-based state in .sdd/ only - no databases."

State file	Content
`.sdd/runtime/task-backlog.json`	Open/claimed/closed tasks with role assignments
`.sdd/runtime/agent_tokens/`	Per-session zero-trust JWTs for bearer-token task server
`.sdd/audit/YYYY-MM-DD.jsonl`	HMAC-SHA256 chained audit log (one per UTC day, chained across days)
`.sdd/traces/`	OpenTelemetry traces for agent runs
`.sdd/config/triggers.yaml`	Event-driven trigger rules
`~/.martin/runs/`	(not used; Bernstein uses `.sdd/`)

Codebase RAG

core/knowledge/rag.py — SQLite FTS5 with BM25 ranking + AST-aware chunking for Python files. No neural embeddings; purely lexical/keyword retrieval.

Semantic Cache

core/knowledge/semantic_cache.py — TF (term-frequency) cosine similarity over word counts, not learned embeddings.

Context Handoff Between Agents

Each agent gets a spawn_prompt (core/agents/spawn_prompt.py) containing its task assignment, owned files, and completion signals.
Context distillation: carries a distilled summary of recent attempts and remaining constraints into subsequent attempts (reduces runaway token growth on retries).
Lessons learned: core/knowledge/lessons.py + core/knowledge/knowledge_base.py for cross-session learning.

Session Persistence

cross_session_handoff: yes — audit log and task backlog persist across sessions
context_compaction_handling: no (Bernstein is an external orchestrator, not a Claude Code plugin that hooks into compaction events — though it has a PreCompact-equivalent in that it distills context on retry)
Crash recovery: heartbeat monitoring (core/agents/heartbeat.py) + loop detector (core/observability/loop_detector.py)

Audit Replay

The entire run is replayable: bernstein schedule audit walks persisted fire receipts to prove the sequence is replayable. The deterministic scheduler means yesterday's plan + same inputs = same task graph.

Orchestration

Bernstein — Orchestration

Pattern

Hierarchical parallel-fan-out with a deterministic Python scheduler as the control plane. One LLM call decomposes the goal; from there, all routing is code:

Manager (Python scheduler)
├── Worker-1 (backend role, claude-sonnet in worktree A)
├── Worker-2 (qa role, codex in worktree B)
├── Worker-3 (security role, claude-sonnet in worktree C)
└── Janitor  (verifier — runs tests/lint/types before any merge)

Isolation Mechanism

Git worktrees: one per task, per repo. Multiple sandbox backends available: Docker, E2B microVM, Modal.

Multi-Model Routing

Yes — per bernstein.yaml:

role_model_policy:
  manager: {model: opus, effort: max}
  architect: {model: opus, effort: max}
  security: {model: sonnet, effort: high}
  backend: {model: sonnet, effort: high}
  qa: {model: sonnet, effort: high}
  reviewer: {model: sonnet, effort: high}

Internal scheduler LLM is also configurable independently of worker models (internal_llm_provider, internal_llm_model).

Consensus Mechanism

None — scheduling is deterministic Python, not consensus-based. The HMAC audit chain provides tamper-evident ordering, but there is no raft/quorum/CRDT protocol.

Concurrent Agents

Configurable via max_agents in bernstein.yaml. Self-dog-food config uses max_agents: 7. No hard system cap — limited by available CPU/memory and worktree storage.

Execution Mode

One-shot (primary): bernstein run -g "<goal>" completes and exits. Continuous/daemon modes:

bernstein autofix daemon: monitors open PRs, spawns fixer on CI failure
bernstein schedule add for recurring schedules
Event-driven via trigger sources (GitHub, GitLab, Slack, Discord, file watch, webhook)

Cross-Tool Portability

High — 44 CLI adapters covering Claude Code, Codex, Gemini CLI, Aider, Cursor, OpenCode, and 38+ more. Any CLI tool with a --prompt flag works via the generic adapter. Model-agnostic by design.

Prompt Chaining

Yes: the decompose stage writes task descriptions (including context files, owned files, completion signals) that become the spawn_prompt for each worker agent. One stage's output is explicitly another stage's prompt.

Verification (Auto-validators)

The janitor checks before merge:

Lint (ruff check or configured linter)
Type check (configurable)
Tests (configurable test command)
Cross-model review (core/quality/reviewer.py)
Custom quality gates (via bernstein.yaml#quality_gates)

Safety / Security Layer

Auto-approve gate: deterministic deny list (rm -rf, git push --force, DROP TABLE, curl | bash, etc.) — deny-wins, fail-closed
Context integrity scan: injection pattern detection before admitting any attempt
Red-Blue testing (6 deterministic adversarial probes): assertion deletion, silent reverts, context poisoning, budget self-reporting, grounding evasion
Budget preflight: rejects attempts projected to exceed remaining budget

Audit Log

HMAC-SHA256 chained JSONL under .sdd/audit/YYYY-MM-DD.jsonl. One record per scheduling decision. Daily rotation without resetting the chain (last hmac from day N becomes prev_hmac for day N+1). bernstein audit verify checks chain integrity. Replay-capable via bernstein schedule audit.

Ui Cli Surface

Bernstein — UI and CLI Surface

CLI Binary

Name: bernstein Install: pip install bernstein / pipx install bernstein / Homebrew / one-liner Type: Own runtime (not a thin wrapper over claude/codex CLI) Key flags:

bernstein --headless — CI mode: structured JSON output, non-zero exit on failure
bernstein run -g "..." — primary entry point
bernstein --dry-run — cost preview without execution
Multiple JSON output modes via --json

Web Dashboard

Shipped as bernstein gui serve since v2.0.0.

Detail	Value
URL	`http://127.0.0.1:8052/ui/`
Mode	`bernstein gui serve --dev` (expects `npm run dev` on :5173)
Bundle	Vite bundle committed under `src/bernstein/gui/static/` — works from wheel without Node
Tech stack	Vite (frontend framework not specified publicly; likely React given Vite usage)

Dashboard tabs:

Tasks, Agents, Approvals, Audit, Costs, Fleet, Settings

Per-task drawer:

Summary, Logs (SSE + ANSI + virtualized + search + level filters), Diff (split/unified, syntax highlight), Gates (status buckets, auto-expand failures), Deps (upstream/downstream graph), Trace (.sdd/traces/ timeline + filter chips + search)

VS Code Extension

Available at VS Marketplace: alex-chernysh.bernstein (and Open VSX)

Fleet Dashboard

bernstein fleet — supervise multiple Bernstein projects in one view.

Observability

OpenTelemetry (OTLP) export: bernstein.yaml#observability.otlp.endpoint
Prometheus-compatible metrics endpoint via API server
Agent heartbeat monitoring (core/agents/heartbeat.py)
Loop detector (core/observability/loop_detector.py)
Jaeger-compatible tracing

MCP Server Mode

bernstein mcp — exposes Bernstein as an MCP server (stdio). Claude Code, any MCP-capable client can drive runs via MCP tools.

Chat Interfaces

bernstein chat serve --platform=telegram|discord|slack — drive runs with /run, /status, /approve, /reject commands.

Cloud Execution

bernstein cloud — runs agents on Cloudflare Workers with R2-backed workspace sync.

Related frameworks

same archetype · same primary tool · same memory type

Taskmaster AI ★ 27k

A3 MCP-anchored

Converts a PRD into a dependency-ordered JSON task graph that AI coding agents execute one task at a time, eliminating context…

ccmemory ★ 1

A3 MCP-anchored

Accumulates decisions, corrections, and failed approaches from Claude Code sessions into a queryable Neo4j graph so each new…

Pimzino spec-workflow-mcp ★ 4.2k

A3 MCP-anchored

MCP server providing spec-driven development workflow with dashboard-backed approval gates, implementation logging, and VSCode…

MCP Shrimp Task Manager ★ 2.1k

A3 MCP-anchored

Convert natural language requests into structured AI development tasks with chain-of-thought enforcement, reflection gates, and…

LeanSpec ★ 252

A3 MCP-anchored

Provides a unified spec CLI and MCP server over any existing spec backend (markdown, GitHub Issues, ADO), making spec-driven…

Specs Workflow MCP ★ 127

A3 MCP-anchored

Enforces Requirements → Design → Tasks workflow via a single MCP tool with persistent JSON progress tracking that survives…

Distribution

Type: cli-tool
License: Apache-2.0
Install: multi-step
Version: 2.2.x (commit 2026-05-26)

Surfaces

CLI binary: bernstein
CLI subcmds: 25
Local UI: web-dashboard
UI port: 8052
Tech stack: Vite (bundle committed under src/bernstein/gui/static/)

Components

Commands: 3
Skills: 1
Subagents: 1
Hooks: 0
MCP servers: 1
MCP tools: 6
Scripts: 3
Templates: 6

Workflow

Phases: 4
Approval gates: 6
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: Yes
Pattern: hierarchical
Isolation: git-worktree
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text

Execution

Mode: one-shot
Crash recovery: Yes
Compaction: Yes
Session handoff: Yes
Streaming: Yes

Memory

Type: file-based
Persistence: project
Search: full-text
State files: 6 files

Quality

TDD: Optional
TDD mechanism: dedicated-skill
Validators: 5
Self-review: adversarial-subagent

Git / Observability

Auto commit: No
Auto PR: Yes
Auto merge: Yes
Worktree/feat: Yes
Audit log: Yes
Audit format: jsonl
Replay: Yes

Tools

Primary: claude-code
Targets: 22
Portability: high

Signals

Stars: 460
Last commit: 2026-05-26
Contributors: 21
Maintainer: active
Quality score: 8.5/10

Summary

Bernstein — Summary

Overview

Bernstein — Overview

Origin

Philosophy

Manifesto Quotes

Positioning

Version Analyzed

Architecture

Bernstein — Architecture

Distribution

Required Runtime

Key Config Files

Directory Tree

4-Stage Pipeline

Target AI Tools

Sandbox / Isolation Backends

Artifact Storage Sinks

Components

Bernstein — Components

CLI Binary: bernstein

Stock YAML Workflows

Claude Code Plugin Components

Commands (3)

Agent (1)

MCP Server (1)

Skills (1 template)

Security / Compliance Components

Core Orchestration Modules

Prompts

Bernstein — Prompts

Verbatim Excerpt 1: Orchestrator Agent (agents/orchestrator.md)

Verbatim Excerpt 2: Test Runner Skill (templates/skills/bernstein-test-runner.md)

Uniqueness

Bernstein — Uniqueness and Positioning

Differs from Seeds

Positioning

Observable Failure Modes

Workflow

Bernstein — Workflow

Standard Run Workflow (4 stages)

YAML Workflow Manifest (optional, declarative DAG)

Approval Gates

Key State Files

Spec Format

Git Automation

Memory Context

Bernstein — Memory and Context

State Storage

Codebase RAG

Semantic Cache

Context Handoff Between Agents

Session Persistence

Audit Replay

Orchestration

Bernstein — Orchestration

Pattern

Isolation Mechanism

Multi-Model Routing

Consensus Mechanism

Concurrent Agents

Execution Mode

Cross-Tool Portability

Prompt Chaining

Verification (Auto-validators)

Safety / Security Layer

Audit Log

Ui Cli Surface

Bernstein — UI and CLI Surface

CLI Binary

Web Dashboard

VS Code Extension

Fleet Dashboard

Observability

MCP Server Mode

Chat Interfaces

Cloud Execution

Related frameworks

CLI Binary: `bernstein`

Verbatim Excerpt 1: Orchestrator Agent (`agents/orchestrator.md`)

Verbatim Excerpt 2: Test Runner Skill (`templates/skills/bernstein-test-runner.md`)