Skip to content
/

Bernstein

bernstein · sipyourdrink-ltd/bernstein · ★ 460 · last commit 2026-05-26

Govern parallel CLI coding agents with a deterministic Python scheduler, HMAC-chained audit trail, and compliance-ready signed lineage so multi-agent runs are reproducible and auditability is provable.

Best whenThe orchestrator should be plain Python code, not an LLM — deterministic scheduling is not a job for a model that produces different results on every call.
Skip ifUsing an LLM in the coordination/scheduling loop, Running agents on shared branches without worktree isolation
vs seeds
superpowers/bmad-method (prompt-behavior augmentation), Bernstein is an executable runtime daemon that orchestrates external proces…
Primitive shape 11 total
Commands 3 Skills 1 Subagents 1 MCP tools 6
00

Summary

Bernstein — Summary

Bernstein (sipyourdrink-ltd) is a deterministic Python scheduler that runs a crew of CLI coding agents (Claude Code, Codex, Gemini CLI, and 40+ others) against a single goal in parallel git worktrees, with an HMAC-SHA256-chained audit log over every scheduling decision per RFC 2104. Named after the conductor Leonard Bernstein, it orchestrates agents the way a conductor orchestrates an orchestra: every player on cue, the score deterministic, the conductor accountable for the result. The core innovation is that zero LLM calls appear in the coordination loop — one LLM call decomposes the goal, then plain Python decides who runs, where, with what budget; making runs fully reproducible and replayable. It ships a bernstein CLI binary (PyPI + Homebrew), an optional web dashboard at localhost:8052, 44 CLI agent adapters, signed agent cards (Ed25519/EdDSA), per-artefact lineage records, and a janitor that gates merges on tests/lint/types/cross-model review before any result lands on main.

Differs from seeds: Closest to claude-flow (both run parallel agents with worktree isolation and an audit layer) but diverges radically in the coordination plane — claude-flow uses an LLM (hive-mind) for task routing and CRDT/consensus protocols, while Bernstein uses pure Python scheduling with zero LLM in the loop. Unlike superpowers (skill-injection framework) or bmad-method (persona-based workflow), Bernstein is an executable runtime with a 4-stage pipeline (decompose → spawn → verify → merge) rather than a prompt-behavior augmentation layer. The HMAC-chained audit log, signed agent cards, per-artefact lineage, and air-gap deployment profile put Bernstein in a compliance-focused tier that no seed framework occupies.

01

Overview

Bernstein — Overview

Origin

Created by Alex Chernysh (GitHub: chernistry) after paying ~$400/month in Claude bills running three coding agents in parallel and getting nondeterministic merges. The chernistry/bernstein GitHub path is a redirect alias to the canonical sipyourdrink-ltd/bernstein repo. Apache 2.0, solo maintained. First published 2025.

Philosophy

Named after Leonard Bernstein, the American conductor and composer. The project orchestrates a crew of CLI coding agents the way Bernstein conducted the New York Philharmonic: every player on cue, the score deterministic, the conductor accountable for the result.

Core design principles:

  • Short-lived workers: agents are spawned for focused work and then exit.
  • File-first state: runtime state is persisted under .sdd/.
  • Deterministic orchestration: scheduling and lifecycle decisions are code-driven.
  • Verification before closure: task completion passes through janitor/quality logic.
  • Multi-adapter runtime: Bernstein is CLI-agent agnostic via adapter interfaces.

Manifesto Quotes

From the README:

"To achieve great things, two things are needed: a plan and not quite enough time." — Leonard Bernstein

"Most agent orchestrators use an LLM to decide who does what. That is non-deterministic and burns tokens on scheduling instead of code. Bernstein does one LLM call to break down your goal, then the rest (running agents in parallel, isolating their git branches, running tests, routing retries) is plain Python. Every run is reproducible. Every step is logged and replayable."

"I wrote Bernstein because I was paying $400/month in Claude bills running three coding agents in parallel and getting nondeterministic merges."

Positioning

  • Compliance-sensitive teams requiring HMAC-signed audit logs (EU AI Act Article 12, SOC 2 CC4/CC7, DORA/NIS2 mappings in docs/compliance/)
  • Engineering teams running ≥3 CLI coding agents in parallel
  • Platform teams needing a per-agent-decision audit log
  • Anyone burning >$1k/month on coding agents who wants deterministic replay
  • Forward-deployed engineers who need credentials in their env, not the client's

Explicitly not for:

  • Single pair-programmer use cases
  • Prototypes where merge gates are overkill
  • Non-coding tasks (research, writing, data pipelines)
  • SaaS-with-support-SLA requirements
  • "Emergent collaboration" research scenarios

Version Analyzed

v2.2.x (last commit 2026-05-26)

02

Architecture

Bernstein — Architecture

Distribution

  • Primary: pip install bernstein / pipx install bernstein / uv tool install bernstein
  • One-liner: curl -fsSL https://bernstein.run/install.sh | sh (macOS/Linux)
  • Windows: irm https://bernstein.run/install.ps1 | iex
  • Homebrew: brew tap chernistry/tap && brew install bernstein
  • Fedora/RHEL: sudo dnf copr enable alexchernysh/bernstein && sudo dnf install bernstein
  • npm wrapper: npx bernstein-orchestrator
  • Docker (GHCR): docker run --rm -v "$PWD:/work" -w /work -e ANTHROPIC_API_KEY ghcr.io/sipyourdrink-ltd/bernstein:latest run -g "..."

Required Runtime

  • Python 3.12+
  • Git (for worktree isolation)
  • One or more CLI agent tools installed (claude, codex, gemini, aider, etc.)

Key Config Files

  • bernstein.yaml — main project config (goal, max_agents, role_model_policy, quality_gates, observability, autofix)
  • .sdd/ — runtime state directory (task backlog, audit log, agent tokens, traces, runtime state)
  • .sdd/audit/YYYY-MM-DD.jsonl — HMAC-chained audit log
  • .sdd/config/triggers.yaml — trigger rules
  • bernstein-skills.toml — skill lifecycle manifest
  • .plugin/plugin.json — Claude Code plugin manifest (commands, agents, MCP server)
  • .mcp.json — MCP server config (points to uv run bernstein mcp)

Directory Tree

sipyourdrink-ltd/bernstein/
├── src/bernstein/
│   ├── adapters/          # 44 CLI agent adapters
│   ├── cli/               # CLI entrypoint
│   ├── core/
│   │   ├── agents/        # Agent lifecycle, heartbeat, spawn
│   │   ├── approval/      # Tool-call approval gate
│   │   ├── autofix/       # Autofix daemon
│   │   ├── compliance/    # EU AI Act, SOC2 mappings
│   │   ├── cost/          # Budget tracking, anomaly detection
│   │   ├── git/           # Worktree management
│   │   ├── knowledge/     # SQLite FTS5 RAG, semantic cache
│   │   ├── lineage/       # Per-artefact lineage records
│   │   ├── orchestration/ # tick_pipeline, trigger_manager
│   │   ├── quality/       # janitor.py, quality_gates.py, reviewer.py
│   │   ├── replay/        # Deterministic replay
│   │   ├── security/      # HMAC audit, agent cards, auto_approve
│   │   ├── tasks/         # Task lifecycle state machine
│   │   └── workflows/     # YAML workflow manifests (archon-inspired)
│   └── gui/
│       └── static/        # Vite bundle for web dashboard
├── agents/
│   └── orchestrator.md    # Claude Code agent definition
├── commands/              # Claude Code slash commands (run, status, stop)
├── bernstein.yaml         # Main project config (used by Bernstein to build itself)
├── bernstein-skills.toml  # Skill lifecycle manifest
├── .plugin/
│   └── plugin.json        # Claude Code plugin manifest
└── docs/                  # Full documentation tree

4-Stage Pipeline

1. Decompose  → One LLM call breaks goal into tasks with roles, owned files, signals
2. Spawn      → Agents start in isolated git worktrees, one per task
3. Verify     → Janitor checks: tests pass, files exist, lint clean, types correct
4. Merge      → Verified work lands on main; failed tasks retry or re-route

The orchestrator is a Python scheduler, not an LLM. Every scheduling decision writes a row to .sdd/audit/YYYY-MM-DD.jsonl.

Target AI Tools

  • Primary: Claude Code, Codex CLI, Gemini CLI
  • Also supported (44 adapters): Aider, Amp, Cursor, Devin Terminal, Continue, Goose, gptme, Plandex, OpenCode, Qwen, OpenHands, Open Interpreter, AIChat, Letta Code, Kilo, Kiro, Junie, AWS Q Developer, Ollama+Aider, Cody, Cloudflare Agents, IaC (Terraform/Pulumi), GitHub Copilot CLI, OpenAI Agents SDK v2, Generic --prompt wrapper
  • Internal scheduler LLM is configurable: internal_llm_provider: gemini|qwen|ollama|claude...

Sandbox / Isolation Backends

  • Default: git worktrees (one per task)
  • Optional: Docker (pip install 'bernstein[docker]'), E2B microVM (pip install 'bernstein[e2b]'), Modal (pip install 'bernstein[modal]')

Artifact Storage Sinks

Local (default), S3, GCS, Azure Blob, Cloudflare R2

03

Components

Bernstein — Components

CLI Binary: bernstein

Main commands (from README + CLI docs):

Command Purpose
bernstein init Creates .sdd/ workspace + bernstein.yaml
bernstein run -g "<goal>" One-shot multi-agent run with goal decomposition
bernstein run plan.yaml Execute a YAML plan directly (skip LLM decomposition)
bernstein run --dry-run plan.yaml Preview tasks and estimated cost
bernstein live Watch progress in the TUI dashboard
bernstein stop Graceful shutdown with drain
bernstein gui serve Start web dashboard at localhost:8052
bernstein gui serve --dev Dev mode with hot reload on :5173
bernstein gui serve --minimal Minimal mode, no full /api/v1/* surface
bernstein pr Auto-create GitHub PR from completed session
bernstein from-ticket <url> Import Linear / GitHub Issues / Jira ticket
bernstein autofix Daemon: monitors PRs, spawns fixer agent on CI failure
bernstein hooks Lifecycle hooks (pre_task, post_task, pre_merge)
bernstein backlog claim Atomically claim a task from .sdd/runtime/task-backlog.json
bernstein chat serve Drive runs from Telegram/Discord/Slack
bernstein workflow run <name> Run a YAML workflow manifest
bernstein workflow list List bundled + user-installed workflows
bernstein workflow init <name> Scaffold a starter manifest
bernstein workflow validate <path> Validate a manifest
bernstein schedule add/list/run Manage recurring schedules
bernstein schedule audit Verify replayable fire receipt sequence
bernstein integrations list --installed List installed agent adapters
bernstein audit verify Verify HMAC chain integrity
bernstein lineage verify <run_id> Verify per-artefact lineage
bernstein cloud login/deploy/run Cloudflare Workers execution
bernstein mcp Start MCP server mode
bernstein --headless CI mode: structured JSON output, non-zero exit on failure

Stock YAML Workflows

idea-to-pr, refactor-with-tests, security-review, doc-update, dependency-bump, hot-fix

Claude Code Plugin Components

Commands (3)

  • run.md — Start a multi-agent orchestration run
  • status.md — Show current orchestration state
  • stop.md — Gracefully stop running orchestration

Agent (1)

  • agents/orchestrator.md — Claude Code agent definition. Uses MCP tools (bernstein_run, bernstein_status, bernstein_tasks, bernstein_cost, bernstein_stop, bernstein_approve) to drive orchestration.

MCP Server (1)

  • Exposed via bernstein mcp (stdio); wired via .mcp.json pointing to uv run bernstein mcp
  • Tools include: bernstein_run, bernstein_status, bernstein_tasks, bernstein_cost, bernstein_stop, bernstein_approve (confirmed from orchestrator.md)

Skills (1 template)

  • templates/skills/bernstein-test-runner.md — How to run tests correctly in this project

Security / Compliance Components

  • src/bernstein/core/security/audit.py — HMAC-SHA256 chain over .sdd/audit/*.jsonl
  • src/bernstein/core/security/agent_card_signer.py — JWS (RFC 7515) + JCS (RFC 8785) + Ed25519 (RFC 8037) signing
  • src/bernstein/core/security/auto_approve.py — Deterministic allow/deny classifier for tool calls
  • docs/compliance/ — EU AI Act Article 12, SOC 2 CC4/CC7, DORA/NIS2, OWASP ASI06 mappings
  • docs/security/lethal-trifecta.md — Capability gate for dangerous capability combinations

Core Orchestration Modules

  • core/orchestration/orchestrator.py — Main tick-based scheduler
  • core/orchestration/tick_pipeline.py — Per-tick execution pipeline
  • core/tasks/task_lifecycle.py — Task state machine
  • core/agents/agent_lifecycle.py — Per-agent process management
  • core/routing/router.py + cascade_router.py — Model routing (cascade fallback)
  • core/quality/janitor.py — Verification gate before merge
  • core/knowledge/rag.py — SQLite FTS5 + BM25 codebase RAG (no neural embeddings)
  • core/replay/ — Deterministic replay
  • core/workflows/ — Archon-inspired YAML manifest runner
05

Prompts

Bernstein — Prompts

Verbatim Excerpt 1: Orchestrator Agent (agents/orchestrator.md)

---
name: orchestrator
description: Decomposes goals into parallel tasks, assigns them to CLI coding agents, verifies output, and merges results. Use when a task is too large for a single agent.
---

You are the Bernstein orchestrator. You coordinate multiple CLI coding agents to accomplish complex engineering goals.

Your capabilities:
1. Decompose a high-level goal into independent tasks
2. Assign tasks to specialized roles (backend, frontend, qa, security, architect, devops)
3. Spawn agents in parallel with git worktree isolation
4. Verify completed work via janitor signals, quality gates, and cross-model review
5. Handle failures with automatic retry, cascade fallback, and task decomposition

Use the Bernstein MCP tools (bernstein_run, bernstein_status, bernstein_tasks, bernstein_cost, bernstein_stop, bernstein_approve) to drive orchestration.

Do not attempt to do all the work yourself. Delegate to Bernstein and monitor progress.

Prompting technique: Role declaration + explicit capability list + tool-use instruction + anti-pattern prohibition ("Do not attempt to do all the work yourself"). The agent is a thin MCP-tool dispatcher, not a planner — all coordination logic lives in the Python scheduler.


Verbatim Excerpt 2: Test Runner Skill (templates/skills/bernstein-test-runner.md)

---
name: bernstein-test-runner
description: Run project tests correctly to avoid memory leaks and failures
whenToUse: When running tests, checking test results, or verifying code correctness
---

**IMPORTANT**: Always use the project's test runner script, never run pytest directly on the full suite.

Run all tests (isolated per-file, prevents memory leaks):

```bash
uv run python scripts/run_tests.py -x

Run a single test file:

uv run pytest tests/unit/test_foo.py -x -q

NEVER run uv run pytest tests/ -x -q - this leaks 100+ GB RAM across 2000+ tests.

Check linting before completing:

uv run ruff check src/

**Prompting technique**: Explicit anti-pattern prohibition with consequence ("leaks 100+ GB RAM"), followed by positive procedural guidance. Classic Iron Law with rationalization pattern borrowed from superpowers — banning the wrong action and explaining why prevents the LLM from defaulting to a simpler-seeming alternative.

---

## Verbatim Excerpt 3: HMAC Audit Record Format (from `docs/security/audit-log.md`)

```json
{
  "timestamp": "2026-05-07T14:30:00.000000Z",
  "event_type": "task.transition",
  "actor": "orchestrator",
  "resource_type": "task",
  "resource_id": "TASK-001",
  "details": {"from_status": "open", "to_status": "claimed"},
  "prev_hmac": "0000…0000",
  "hmac": "d4e5f6…"
}

The hmac field is computed as:

payload   = prev_hmac + json.dumps(entry_without_hmac, sort_keys=True)
entry.hmac = HMAC_SHA256(audit_key, payload).hexdigest()

Technique: Structured data schema rather than a prompt — this is the output contract the scheduler enforces on every scheduling decision. Illustrates Bernstein's "receipts not prompts" philosophy: auditability is a hard constraint encoded in the runtime, not in an AI prompt.

09

Uniqueness

Bernstein — Uniqueness and Positioning

Differs from Seeds

Bernstein is closest to claude-flow among the seed frameworks in that both run multiple parallel agents with git worktree isolation and maintain audit state. However, the architectures diverge fundamentally: claude-flow uses an LLM hive-mind for task routing with CRDT/raft consensus protocols, while Bernstein uses zero LLM in the coordination loop — plain Python decides everything after the initial goal decomposition. This makes Bernstein deterministic and replayable in a way claude-flow cannot be.

Versus superpowers and bmad-method: Bernstein is an executable runtime daemon, not a prompt-behavior augmentation layer. It replaces the manual "dispatch a subagent" pattern with an automated scheduler. Bernstein does not inject skills into Claude Code sessions; it orchestrates external CLI agent processes from outside their context window.

Versus taskmaster-ai: Both manage task queues and support multiple AI tool backends, but Taskmaster is a project planning and tracking layer (MCP server + task JSON) that still expects a human or single agent to execute tasks. Bernstein is a full execution engine that spawns, monitors, verifies, and merges agent work.

Versus agent-os and claude-conductor: These are markdown scaffold methodologies. Bernstein is a binary with an API server, web dashboard, and 44 process adapters.

The compliance angle (HMAC audit chain, signed agent cards, per-artefact lineage, EU AI Act mapping, SOC 2 mappings) has no counterpart in any seed framework. This is the most distinctive architectural dimension — Bernstein is positioned as "the orchestrator your compliance team will sign off on."

Positioning

Bernstein fills the gap between "run one agent locally" and "buy a cloud SaaS with a credit card form." It is on-prem only by design, BYOK, Apache 2.0, and targets engineering orgs running ≥3 parallel CLI agents who need auditability, determinism, and cost control.

Observable Failure Modes

  1. Over-decomposition: The single LLM decompose call may split a goal into too many tasks, creating merge coordination overhead that outweighs parallelism gains.
  2. Worktree accumulation: Long-running projects accumulate stale worktrees; requires bernstein worktrees gc discipline.
  3. Adapter drift: 44 CLI adapters means any adapter's CLI contract change can silently break that adapter.
  4. Cost underestimation: Budget caps in bernstein.yaml are estimates; actual model costs drift with prompt size growth.
  5. Solo-maintenance risk: As of analysis, solo maintained by one engineer; no organizational bus-factor protection.
  6. Audit key management: If the HMAC audit key is lost or rotated incorrectly, the chain breaks and historical auditability is lost.
04

Workflow

Bernstein — Workflow

Standard Run Workflow (4 stages)

Phase What happens Artifact
1. Decompose One LLM call breaks the goal into tasks with roles, owned files, and completion signals .sdd/runtime/task-backlog.json
2. Spawn Agents start in isolated git worktrees (one per task), credentials scoped per agent ~/.local/share/bernstein/worktrees/<run_id>/
3. Verify Janitor checks concrete signals: tests pass, files exist, lint clean, types correct, cross-model review .sdd/audit/YYYY-MM-DD.jsonl (append entry per check)
4. Merge Verified work lands on main; failed tasks retry or route to different model Merged git worktree branch

YAML Workflow Manifest (optional, declarative DAG)

# example: idea-to-pr manifest structure
steps:
  - id: plan
    type: agent
    role: architect
    ...
  - id: implement
    type: loop
    fresh_context: true   # new agent session per iteration
    until: "pnpm test"    # bash predicate exits 0
    ...
  - id: review
    type: agent
    role: reviewer
    ...

Loop nodes re-fire until a bash predicate exits 0. fresh_context: true mints a new agent session per iteration. Per-step CLI/model routing available.

Approval Gates

Gate Type Trigger
Tool-call approval gate auto-approve/deny (deterministic classifier) or human queue Before any tool call reaches operator queue
Janitor verification gate automated After agent completes task, before merge
Merge gate automated or human-required All quality gates must pass
Budget preflight automated hard stop Before each new attempt, checks remaining budget
Lethal-trifecta capability gate automated deny Specific dangerous capability combinations
Human escalation on injection detection automated → human required Context integrity scan detects injection patterns

Key State Files

File Purpose
.sdd/audit/YYYY-MM-DD.jsonl HMAC-SHA256-chained audit log, one record per scheduling decision
.sdd/runtime/task-backlog.json Open/claimed/closed task queue
.sdd/runtime/agent_tokens/ Per-session zero-trust JWTs
.sdd/traces/ OpenTelemetry trace files
bernstein.yaml Project config (goal, budget, role-model policy, quality gates)

Spec Format

None (goal is plain text). YAML for workflow manifests. JSON for task backlog.

Git Automation

  • Creates worktree per task (automated)
  • Runs tests/lint/types before merge (automated)
  • Creates GitHub PR (bernstein pr, semi-automated — operator invokes)
  • Merges verified branches to main (automated on passing quality gates)
06

Memory Context

Bernstein — Memory and Context

State Storage

Primary mechanism: File-based state under .sdd/ directory. Bernstein explicitly bans databases in its own bernstein.yaml constraint: "File-based state in .sdd/ only - no databases."

State file Content
.sdd/runtime/task-backlog.json Open/claimed/closed tasks with role assignments
.sdd/runtime/agent_tokens/ Per-session zero-trust JWTs for bearer-token task server
.sdd/audit/YYYY-MM-DD.jsonl HMAC-SHA256 chained audit log (one per UTC day, chained across days)
.sdd/traces/ OpenTelemetry traces for agent runs
.sdd/config/triggers.yaml Event-driven trigger rules
~/.martin/runs/ (not used; Bernstein uses .sdd/)

Codebase RAG

core/knowledge/rag.py — SQLite FTS5 with BM25 ranking + AST-aware chunking for Python files. No neural embeddings; purely lexical/keyword retrieval.

Semantic Cache

core/knowledge/semantic_cache.py — TF (term-frequency) cosine similarity over word counts, not learned embeddings.

Context Handoff Between Agents

  • Each agent gets a spawn_prompt (core/agents/spawn_prompt.py) containing its task assignment, owned files, and completion signals.
  • Context distillation: carries a distilled summary of recent attempts and remaining constraints into subsequent attempts (reduces runaway token growth on retries).
  • Lessons learned: core/knowledge/lessons.py + core/knowledge/knowledge_base.py for cross-session learning.

Session Persistence

  • cross_session_handoff: yes — audit log and task backlog persist across sessions
  • context_compaction_handling: no (Bernstein is an external orchestrator, not a Claude Code plugin that hooks into compaction events — though it has a PreCompact-equivalent in that it distills context on retry)
  • Crash recovery: heartbeat monitoring (core/agents/heartbeat.py) + loop detector (core/observability/loop_detector.py)

Audit Replay

The entire run is replayable: bernstein schedule audit walks persisted fire receipts to prove the sequence is replayable. The deterministic scheduler means yesterday's plan + same inputs = same task graph.

07

Orchestration

Bernstein — Orchestration

Pattern

Hierarchical parallel-fan-out with a deterministic Python scheduler as the control plane. One LLM call decomposes the goal; from there, all routing is code:

Manager (Python scheduler)
├── Worker-1 (backend role, claude-sonnet in worktree A)
├── Worker-2 (qa role, codex in worktree B)
├── Worker-3 (security role, claude-sonnet in worktree C)
└── Janitor  (verifier — runs tests/lint/types before any merge)

Isolation Mechanism

Git worktrees: one per task, per repo. Multiple sandbox backends available: Docker, E2B microVM, Modal.

Multi-Model Routing

Yes — per bernstein.yaml:

role_model_policy:
  manager: {model: opus, effort: max}
  architect: {model: opus, effort: max}
  security: {model: sonnet, effort: high}
  backend: {model: sonnet, effort: high}
  qa: {model: sonnet, effort: high}
  reviewer: {model: sonnet, effort: high}

Internal scheduler LLM is also configurable independently of worker models (internal_llm_provider, internal_llm_model).

Consensus Mechanism

None — scheduling is deterministic Python, not consensus-based. The HMAC audit chain provides tamper-evident ordering, but there is no raft/quorum/CRDT protocol.

Concurrent Agents

Configurable via max_agents in bernstein.yaml. Self-dog-food config uses max_agents: 7. No hard system cap — limited by available CPU/memory and worktree storage.

Execution Mode

One-shot (primary): bernstein run -g "<goal>" completes and exits. Continuous/daemon modes:

  • bernstein autofix daemon: monitors open PRs, spawns fixer on CI failure
  • bernstein schedule add for recurring schedules
  • Event-driven via trigger sources (GitHub, GitLab, Slack, Discord, file watch, webhook)

Cross-Tool Portability

High — 44 CLI adapters covering Claude Code, Codex, Gemini CLI, Aider, Cursor, OpenCode, and 38+ more. Any CLI tool with a --prompt flag works via the generic adapter. Model-agnostic by design.

Prompt Chaining

Yes: the decompose stage writes task descriptions (including context files, owned files, completion signals) that become the spawn_prompt for each worker agent. One stage's output is explicitly another stage's prompt.

Verification (Auto-validators)

The janitor checks before merge:

  • Lint (ruff check or configured linter)
  • Type check (configurable)
  • Tests (configurable test command)
  • Cross-model review (core/quality/reviewer.py)
  • Custom quality gates (via bernstein.yaml#quality_gates)

Safety / Security Layer

  • Auto-approve gate: deterministic deny list (rm -rf, git push --force, DROP TABLE, curl | bash, etc.) — deny-wins, fail-closed
  • Context integrity scan: injection pattern detection before admitting any attempt
  • Red-Blue testing (6 deterministic adversarial probes): assertion deletion, silent reverts, context poisoning, budget self-reporting, grounding evasion
  • Budget preflight: rejects attempts projected to exceed remaining budget

Audit Log

HMAC-SHA256 chained JSONL under .sdd/audit/YYYY-MM-DD.jsonl. One record per scheduling decision. Daily rotation without resetting the chain (last hmac from day N becomes prev_hmac for day N+1). bernstein audit verify checks chain integrity. Replay-capable via bernstein schedule audit.

08

Ui Cli Surface

Bernstein — UI and CLI Surface

CLI Binary

Name: bernstein Install: pip install bernstein / pipx install bernstein / Homebrew / one-liner Type: Own runtime (not a thin wrapper over claude/codex CLI) Key flags:

  • bernstein --headless — CI mode: structured JSON output, non-zero exit on failure
  • bernstein run -g "..." — primary entry point
  • bernstein --dry-run — cost preview without execution
  • Multiple JSON output modes via --json

Web Dashboard

Shipped as bernstein gui serve since v2.0.0.

Detail Value
URL http://127.0.0.1:8052/ui/
Mode bernstein gui serve --dev (expects npm run dev on :5173)
Bundle Vite bundle committed under src/bernstein/gui/static/ — works from wheel without Node
Tech stack Vite (frontend framework not specified publicly; likely React given Vite usage)

Dashboard tabs:

  • Tasks, Agents, Approvals, Audit, Costs, Fleet, Settings

Per-task drawer:

  • Summary, Logs (SSE + ANSI + virtualized + search + level filters), Diff (split/unified, syntax highlight), Gates (status buckets, auto-expand failures), Deps (upstream/downstream graph), Trace (.sdd/traces/ timeline + filter chips + search)

VS Code Extension

Available at VS Marketplace: alex-chernysh.bernstein (and Open VSX)

Fleet Dashboard

bernstein fleet — supervise multiple Bernstein projects in one view.

Observability

  • OpenTelemetry (OTLP) export: bernstein.yaml#observability.otlp.endpoint
  • Prometheus-compatible metrics endpoint via API server
  • Agent heartbeat monitoring (core/agents/heartbeat.py)
  • Loop detector (core/observability/loop_detector.py)
  • Jaeger-compatible tracing

MCP Server Mode

bernstein mcp — exposes Bernstein as an MCP server (stdio). Claude Code, any MCP-capable client can drive runs via MCP tools.

Chat Interfaces

bernstein chat serve --platform=telegram|discord|slack — drive runs with /run, /status, /approve, /reject commands.

Cloud Execution

bernstein cloud — runs agents on Cloudflare Workers with R2-backed workspace sync.

Related frameworks

same archetype · same primary tool · same memory type

Taskmaster AI ★ 27k

Converts a PRD into a dependency-ordered JSON task graph that AI coding agents execute one task at a time, eliminating context…

ccmemory ★ 1

Accumulates decisions, corrections, and failed approaches from Claude Code sessions into a queryable Neo4j graph so each new…

Pimzino spec-workflow-mcp ★ 4.2k

MCP server providing spec-driven development workflow with dashboard-backed approval gates, implementation logging, and VSCode…

MCP Shrimp Task Manager ★ 2.1k

Convert natural language requests into structured AI development tasks with chain-of-thought enforcement, reflection gates, and…

LeanSpec ★ 252

Provides a unified spec CLI and MCP server over any existing spec backend (markdown, GitHub Issues, ADO), making spec-driven…

Specs Workflow MCP ★ 127

Enforces Requirements → Design → Tasks workflow via a single MCP tool with persistent JSON progress tracking that survives…