MartinLoop

martinloop · Keesan12/Martin-Loop · ★ 22 · last commit 2026-05-26

Primitive shape 11 total

MCP tools 11

Summary

MartinLoop — Summary

Slug: martinloop
GitHub: https://github.com/Keesan12/Martin-Loop
Stars: 22
License: Apache-2.0
Language: TypeScript (Node.js 20+)
Status: Active (last commit 2026-05-26); NVIDIA Inception Program accepted

What It Is

MartinLoop is a governance control plane for AI coding agent loops. It wraps Claude Code, Codex, and custom agent runs with hard budget caps, policy checks, verifier gates, rollback evidence, and JSONL audit records.

It does not run code itself — it governs the "Ralph Loop" (the failure mode where an AI coding agent keeps retrying without a stop condition, burning budget with no audit trail).

Core Value Proposition

"AI coding accountability: completes good work, refuses unsafe work, stops uneconomical work."

Same task, same starting state: MartinLoop completes in one verified attempt at $2.30. Uncontrolled loop retries 4× at $5.20 with no audit trail (README benchmark).

Five-Layer Architecture

Task Contract — Objective, verifier plan, repo root, allowed/denied paths, acceptance criteria, budget
Policy & Budget — martin.config.yaml + CLI flags; budget preflight rejects attempts before execution
Agent Adapters — Claude CLI, Codex CLI, direct-provider, stub adapters normalize results
Safety & Verification — Verifier commands, file scope, approval-boundary changes, secret scan, grounding
Persistence — JSONL records at ~/.martin/runs/<workspaceId>.jsonl

Key Capabilities

Budget hard caps (maxUsd, softLimitUsd, maxIterations, maxTokens)
11-class failure taxonomy (hallucination, test regression, scope creep, repo grounding failure, env mismatch, budget pressure, etc.)
Red-Blue adversarial testing (6 probes, 3 risk tiers)
Context injection detection (authority inversion, instruction override, identity redefinition)
Rollback evidence capture
Context distillation for subsequent attempts
JSONL run records (inspectable, resumable)

Packages

martin-loop — root npm package (CLI + SDK)
@martin/contracts — shared types
@martin/core — runtime, policy engine, safety leash
@martin/adapters — Claude CLI, Codex CLI, direct-provider, stub
@martin/cli — CLI implementation
@martinloop/mcp — MCP server (11 tools, 12 resources, 10 prompts)

Overview

MartinLoop — Overview

Origin and Positioning

MartinLoop was created by Keesan (keesan@martinloop.com) and accepted into the NVIDIA Inception Program. The README opens with a concrete failure story: "$2.40 estimated, $65 billed — 47 attempts, no hard stop, no rollback, no audit trail, nothing merged."

The project frames autonomous AI coding loops as having an unsolved failure mode — the "Ralph Loop" — and positions MartinLoop as the control layer that stops that failure before damage happens.

Website: https://martinloop.com

The Ralph Loop Concept

The "Ralph Loop" names the failure mode where an AI coding agent keeps retrying work without knowing when to stop. Symptoms:

No hard budget cap
No signed evidence layer
No pre-execution control system
Knows how to keep trying, does not know when continuing is unsafe, uneconomical, or impossible

MartinLoop governs this by:

Stopping the next attempt before budget overspend
Classifying unsafe/invalid actions before execution
Appending a JSONL audit record for every attempt
Rolling back failed runs instead of leaving broken state
Reducing runaway token growth with context distillation

Freemium Model

Tier	Price	Key capability
Free (OSS)	$0	Full CLI, MCP, local JSONL records
Pro	$49/month	Dashboard (inferred from README)
Growth	$149/month	Inferred
Enterprise	$499/month	Inferred

Philosophy

MartinLoop explicitly does NOT try to replace agent patterns — it makes those patterns safe to run. The distinction is important: it is a governance layer, not an agent platform.

Quote: "AI coding agents are useful. Unbounded retry loops are not."

11-Class Failure Taxonomy

MartinLoop classifies failures into 11 named classes:

Hallucination
Test regression
Scope creep
Repo grounding failure
Environment mismatch
Budget pressure
Context poisoning (detected before admission)
Authority inversion
Instruction override
Identity redefinition
(Additional classes documented in codebase)

This taxonomy distinguishes real success from unsafe, invalid, or terminal behavior.

Red-Blue Testing

Six deterministic adversarial probes before a patch is accepted:

Assertion deletion
Silent reverts
Context poisoning
Budget self-reporting
Grounding evasion
(Additional probe)

Three risk tiers:

baseline — default
high_risk — stricter
release_critical — adds a Haiku model call

A single block-severity finding rejects the patch.

Architecture

MartinLoop — Architecture

Distribution

npm package: martin-loop (root facade)
MCP package: @martinloop/mcp@0.2.5 (standalone)
Install: npx martin-loop demo or npm install martin-loop

Package Structure

Martin-Loop/
├── packages/
│   ├── adapters/         # @martin/adapters — Claude CLI, Codex CLI, direct-provider, stub
│   ├── cli/              # @martin/cli — CLI implementation
│   │   └── src/
│   │       ├── bin/martin.ts
│   │       ├── mcp-config.ts
│   │       ├── persistence.ts
│   │       ├── run-store.ts
│   │       └── ux.ts
│   ├── contracts/        # @martin/contracts — shared types (loops, policy, governance, budget)
│   ├── core/             # @martin/core — runtime, policy engine, safety leash, grounding
│   └── mcp/              # @martinloop/mcp — MCP server
│       └── src/
│           ├── tools/    # 11+ tool handlers
│           ├── resources.ts
│           ├── prompts.ts
│           └── server.ts
├── martin.config.example.yaml
├── AGENTS.md
└── docs/
    └── oss/
        ├── QUICKSTART.md
        ├── RALPH-LOOP-SAFETY.md
        ├── MCP-FOR-AI-AGENTS.md
        └── CLAUDE-CODE-WALKTHROUGH.md

Five-Layer Runtime

[Task Contract Input]
        ↓
[Policy & Budget]  ← martin.config.yaml + CLI flags
  - maxUsd / softLimitUsd / maxIterations / maxTokens
  - destructiveActionPolicy
  - verifierRules
        ↓
[Safety Leash]  ← pre-execution checks
  - verifier command safety
  - file scope validation
  - secret-like values scan
  - context injection detection (authority inversion, instruction override)
        ↓
[Agent Adapter]  ← pluggable
  - Claude CLI adapter
  - Codex CLI adapter
  - Direct-provider adapter
  - Stub adapter (testing)
        ↓
[Verifier Gate]  ← post-execution checks
  - Run verifier commands (pnpm test, pnpm lint, etc.)
  - Red-Blue adversarial probes (6 probes, 3 tiers)
  - Failure taxonomy classification (11 classes)
        ↓
[Persistence]
  - JSONL records: ~/.martin/runs/<workspaceId>.jsonl
  - Rollback boundaries + restore outcomes
  - Context distillation for next attempt

State Files

~/.martin/runs/<workspaceId>.jsonl — JSONL run records (append-only)
martin.config.yaml (project-local) — budget caps and verifier rules
martin.config.example.yaml — template with all configurable fields

MCP Release Train

Version	Features
0.1.4	Operator foundation
0.2.0	Resources, resource templates, prompts, read-only cockpit inspection
0.2.5	Stable cockpit line; local triage; degraded run-store hardening

Adapter Pattern

Each adapter normalizes an agent's execution results into @martin/contracts types, allowing the core runtime to operate identically regardless of whether Claude Code, Codex, or a custom agent is running:

// Claude Code adapter
const loop = new MartinLoop({
  adapter: createClaudeCliAdapter({ workingDirectory: process.cwd() })
});

// Codex adapter
const loop = new MartinLoop({
  adapter: createCodexCliAdapter({ workingDirectory: process.cwd() })
});

Components

MartinLoop — Components

CLI Binary: `martin-loop`

Install: npm install -g martin-loop

CLI Subcommands (inferred from docs/source)

Subcommand	Description
`run`	Execute a governed agent loop
`demo`	Run packaged demo locally
`inspect`	Inspect a past run record
`resume`	Resume a previous run
`doctor`	Preflight environment checks
`preflight`	Pre-execution policy check
`triage`	Rank persisted runs needing attention
`dossier`	Generate a run dossier
`runs list`	List stored run records
`runs get`	Get a specific run
`runs attempt`	Get a specific attempt
`runs verify`	Get verification results
`mcp print-config`	Print MCP configuration
`mcp install`	Install MCP server config

Key CLI Flags

--max-usd <n> — Hard USD budget cap
--soft-limit-usd <n> — Early warning threshold
--max-iterations <n> — Max retry iterations
--max-tokens <n> — Max token budget
--verify <cmd> — Override verifier commands
--policy-profile <name> — Policy profile (balanced, strict, etc.)
--allowed-paths <glob> — Restrict file scope
--denied-paths <glob> — Explicitly deny file access

MCP Server: `@martinloop/mcp`

Install: npx -y @martinloop/mcp

Transport: stdio

MCP Tools (11)

Tool	Description
`martin_doctor`	Environment preflight checks
`martin_preflight`	Pre-run policy/budget check
`martin_run`	Execute governed agent loop (only execution entrypoint)
`martin_inspect`	Inspect loop state
`martin_status`	Get current run status
`martin_list_runs`	List persisted runs
`martin_triage_runs`	Rank runs needing attention
`martin_get_run`	Get specific run record
`martin_get_attempt`	Get specific attempt
`martin_get_verification_results`	Get verifier evidence
`martin_run_dossier`	Generate compact run dossier

MCP Resources (12)

Resource URI	Description
`martin://server/health`	Server health
`martin://runs/recent`	Recent runs
`martin://runs/triage`	Runs needing attention
`martin://runs/latest/summary`	Latest run summary
`martin://runs/latest/proof-card`	Latest run proof card
`martin://runs/latest/budget-status`	Budget status
`martin://runs/latest/verifier-evidence`	Verifier evidence
`martin://runs/latest/rollback-evidence`	Rollback evidence
`martin://agent/next-step`	Recommended next step
`martin://guides/mcp-usage`	Usage guide
`martin://guides/publish-readiness`	Publish readiness guide
`martin://runs/{loopId}`	Specific run (template)
`martin://runs/{loopId}/attempts/{attemptIndex}`	Specific attempt (template)
`martin://runs/{loopId}/verification`	Run verification (template)

MCP Prompts (10)

martin_start, martin_preflight, martin_triage, martin_resume, martin_prove, martin_release_check, martin_governed_coding_kickoff, martin_debug_failed_run, martin_publish_readiness_review, martin_triage_run_store

Config Schema: `martin.config.yaml`

policyProfile: balanced
budget:
  maxUsd: 8                   # Hard cost cap
  softLimitUsd: 5             # Early warning threshold
  maxIterations: 3            # Max loop iterations
  maxTokens: 20000            # Max token budget
governance:
  destructiveActionPolicy: approval   # approval | block | allow
  telemetryDestination: control-plane
  verifierRules:
    - pnpm test
    - pnpm lint

SDK Interface

import { MartinLoop, createClaudeCliAdapter } from 'martin-loop';

const loop = new MartinLoop({
  adapter: createClaudeCliAdapter({ workingDirectory: process.cwd() }),
  policy: {
    maxUsd: 5,
    maxIterations: 3,
    maxTokens: 20_000
  }
});

const result = await loop.run({
  task: {
    title: "Fix auth regression",
    objective: "Fix the failing auth regression tests",
    verificationPlan: ["pnpm test"],
    repoRoot: process.cwd()
  }
});

Integration: Claude Code

claude mcp add --transport stdio --scope user martin-loop -- npx -y @martinloop/mcp

Integration: Codex

codex mcp add martin-loop -- npx -y @martinloop/mcp

Prompts

MartinLoop — Prompts and Key Artifacts

MCP run_loop Tool Input Schema (verbatim from source)

export interface RunLoopInput {
  objective: string;
  workingDirectory?: string;
  engine?: "claude" | "codex";
  model?: string;
  maxUsd?: number;
  maxIterations?: number;
  maxTokens?: number;
  verificationPlan?: string[];
  allowedPaths?: string[];
  deniedPaths?: string[];
  workspaceId?: string;
  projectId?: string;
}

MCP run_loop Tool Output Schema (verbatim from source)

export interface RunLoopOutput {
  status: string;
  lifecycleState: string;
  reason: string;
  attempts: number;
  costUsd: number;
  verificationPassed: boolean;
  loopId: string;
  pressure: string;
  shouldStop: boolean;
  remainingBudgetUsd: number;
  remainingIterations: number;
  remainingTokens: number;
  engine: MartinEngine;
  workingDirectory: string;
  budget: LoopBudget;
  inspection: {
    runsRoot: string;
    runDirectory: string;
    loopRecordPath: string;
    ledgerPath: string;
    // ... loop preview, verification summary, artifacts
  };
}

Config Example (verbatim from martin.config.example.yaml)

policyProfile: balanced
budget:
  # Hard cost cap for a run.
  maxUsd: 8
  # Early warning threshold below the hard cap.
  softLimitUsd: 5
  # Maximum number of loop iterations.
  maxIterations: 3
  # Maximum token budget for the run.
  maxTokens: 20000
governance:
  # Policy used when the run would take a destructive action.
  destructiveActionPolicy: approval
  # Where governance/telemetry events should be sent.
  telemetryDestination: control-plane
  # Verification commands used when --verify is not supplied.
  verifierRules:
    - pnpm test
    - pnpm lint

MCP Preflight Tool Call (from quickstart doc)

{
  "tool": "martin_preflight",
  "arguments": {
    "objective": "Fix the auth regression and prove it with tests",
    "engine": "codex",
    "maxUsd": 3,
    "maxIterations": 3,
    "verificationPlan": ["pnpm test --filter auth"],
    "allowedPaths": ["src/**", "tests/**"],
    "deniedPaths": [".env*", "secrets/**"]
  }
}

MCP Doctor Tool Call (from quickstart doc)

{
  "tool": "martin_doctor",
  "arguments": {
    "engine": "codex"
  }
}

MCP Recommended Allow-List (read-only)

martin_doctor
martin_preflight
martin_list_runs
martin_triage_runs
martin_run_dossier

MCP Prompts

Named prompts for discovery-first workflows:

Prompt	Use Case
`martin_start`	Onboarding + first run setup
`martin_preflight`	Pre-execution policy check narrative
`martin_triage`	Rank runs needing attention
`martin_resume`	Resume interrupted run
`martin_prove`	Prove run completion to reviewer
`martin_release_check`	Pre-release governance check
`martin_governed_coding_kickoff`	Start governed coding session
`martin_debug_failed_run`	Debug failed run with evidence
`martin_publish_readiness_review`	npm/release readiness
`martin_triage_run_store`	Triage full run store

JSONL Record Format

Append-only records at ~/.martin/runs/<workspaceId>.jsonl:

{
  "loopId": "ml-abc123",
  "status": "completed",
  "lifecycleState": "verified",
  "costUsd": 2.30,
  "attempts": 1,
  "verificationPassed": true,
  "failureTaxonomy": null,
  "remainingBudget": { "usd": 5.70, "iterations": 2 },
  "redBlueResults": { "tier": "baseline", "decision": "pass" },
  "rollbackBoundary": { "commitSha": "abc123def" },
  "timestamp": "2026-05-26T10:00:00Z"
}

Uniqueness

MartinLoop — Uniqueness and Positioning

Differs from Seeds

MartinLoop occupies a unique niche in this batch: it is not a multi-agent orchestrator. It is a governance wrapper for single-agent coding loops. This makes it architecturally different from every other framework in the batch.

Versus claude-flow (parallel multi-agent orchestration): claude-flow maximizes parallelism and throughput. MartinLoop maximizes trust, auditability, and cost control on a single agent run.

Versus bernstein (Python scheduler with HMAC audit): Bernstein orchestrates a pipeline of specialized agents (manager, architect, worker, QA) with deterministic scheduling and compliance-grade HMAC audit logs. MartinLoop governs a single agent loop with budget caps, but does not orchestrate multi-agent pipelines.

Versus taskmaster-ai (task decomposition and tracking): Taskmaster manages tasks but does not execute them or govern the execution loop's budget. MartinLoop governs execution but does not decompose tasks.

Versus stoneforge (Director/Worker/Steward hierarchy): Stoneforge is a full multi-agent workforce platform. MartinLoop is a governance control plane that wraps one agent at a time.

Distinctive Features

Named failure taxonomy (11 classes) — the only framework that classifies why a loop failed into named categories. No other seed or batch framework has an explicit failure taxonomy.
Red-Blue adversarial testing — 6 deterministic adversarial probes (assertion deletion, silent reverts, context poisoning, budget self-reporting, grounding evasion) that run before a patch is accepted. No other framework reviewed has adversarial probe suites.
Budget-as-first-class-concept — maxUsd, softLimitUsd, maxIterations, maxTokens are all configurable and enforced as hard stops. Cost governance at this granularity is not present in any seed framework.
Context injection detection — scans for authority inversion, instruction override, identity redefinition in prompts before admission. Unique security posture.
JSONL audit records — every attempt produces an append-only JSONL record with lifecycle state, budget consumption, failure classification, verifier evidence, and rollback boundary. The closest seed is Bernstein's HMAC-chained audit (more compliance-grade); MartinLoop's is simpler but more inspection-friendly.
Ralph Loop naming — MartinLoop has done conceptual work to name the failure mode it solves. "Ralph Loop" gives practitioners a vocabulary for the ungoverned retry failure mode.

Positioning

MartinLoop targets individual developers and teams who run autonomous coding loops (Claude Code, Codex) unattended and have experienced uncontrolled retries burning budget. It is a safety net/governance add-on, not a replacement for agent frameworks.

The NVIDIA Inception Program acceptance suggests commercial ambitions beyond the OSS tier.

Observable Failure Modes

Governance overhead — wrapping every agent run with preflight, safety leash, and verifier adds latency and setup complexity. Teams may skip governance for "simple" tasks and miss the protection.
Single-agent only (OSS) — no multi-agent coordination. Teams wanting parallel execution must look elsewhere.
No web UI (free tier) — CLI-only for OSS users; run inspection requires martin-loop inspect commands rather than a dashboard.
22 stars — small community; early stage; docs are good but ecosystem is minimal.
Node.js 20+ required — excludes some environments.

Workflow

MartinLoop — Workflow

Governed Agent Loop Flow

User/Operator
     │
     ▼
[1. Doctor] ─── martin_doctor
     │            Environment check: Node version, claude/codex on PATH, config present
     │
     ▼
[2. Preflight] ── martin_preflight
     │            Check: budget feasibility, file scope safety, secret scan in task text
     │            OUTPUT: go / no-go + rationale
     │
     ▼
[3. Run] ───────── martin_run
     │            Task Contract submitted to core runtime
     │            Safety leash checks:
     │              - verifier command evaluation
     │              - context injection scan (authority inversion, instruction override)
     │              - file scope validation
     │              - dependency/migration change approval
     │
     ▼
[4. Agent Execution] ─── Adapter (Claude CLI / Codex CLI / direct-provider)
     │            Agent executes within allowed-paths scope
     │            Context distillation carries attempt summary forward
     │
     ▼
[5. Verifier Gate]
     │            Run verifier commands (e.g., pnpm test, pnpm lint)
     │            Red-Blue adversarial probes (6 deterministic probes)
     │              - Probe tiers: baseline / high_risk / release_critical
     │              - Block-severity finding → patch rejected
     │            Failure taxonomy classification (11 classes)
     │
     ├── PASS → run reaches 'completed' status
     │           JSONL record appended to ~/.martin/runs/<workspaceId>.jsonl
     │           Rollback boundary captured
     │
     └── FAIL → classify failure class
                 Retry if within budget (maxIterations, maxUsd, maxTokens)
                 If budget exhausted → rollback + exit
                 JSONL record appended with failure classification

Budget Governance Loop

Per attempt:
  1. Check remaining budget (maxUsd - spent)
  2. If projected_cost > remaining → reject attempt before execution
  3. If remaining == 0 OR iterations == maxIterations → exit loop
  4. softLimitUsd reached → emit warning but continue

Phase Table

Phase	Action	Gate	Evidence Produced
Doctor	Environment preflight	No	Doctor report
Preflight	Policy/budget check	No (advisory)	go/no-go rationale
Task Contract	Objective + verifier plan + budget	No	Stored contract
Safety Leash	Pre-execution scan	YES — blocks unsafe attempts	Safety report
Agent Execution	Adapter runs agent	No	File diffs, attempt artifacts
Verifier Gate	Run verifiers + Red-Blue probes	YES — blocks failed verification	Verifier evidence
Completion	Status = completed	No	JSONL record + proof card

Approval Gates

Two classes of gates:

Safety Leash gate — runs before every agent attempt; blocks on injection patterns, file scope violations, secret-like values, destructive action policy
Verifier gate — runs after every agent attempt; blocks on failing verifier commands or Red-Blue probe block-severity findings

destructiveActionPolicy: approval triggers a third gate requiring operator approval for destructive changes.

JSONL Run Record Structure

Each attempt appended to ~/.martin/runs/<workspaceId>.jsonl:

{
  "loopId": "<uuid>",
  "workspaceId": "<project-id>",
  "attempt": 1,
  "status": "completed|failed|rejected|rolled_back",
  "failureClass": "test_regression|hallucination|...",
  "budgetConsumed": { "usd": 2.30, "tokens": 14200, "iterations": 1 },
  "verifierResults": [...],
  "redBlueResults": { "tier": "baseline", "probes": [...], "decision": "pass" },
  "rollbackBoundary": { "commitSha": "..." },
  "timestamp": "2026-05-26T10:00:00Z"
}

Resuming Runs

martin-loop resume --run-id <loopId>

Or via MCP: martin_resume prompt or martin_get_run tool to inspect then martin_run with updated budget.

Memory Context

MartinLoop — Memory and Context

Run Record Persistence

Primary persistence: JSONL append-only log files.

Location: ~/.martin/runs/<workspaceId>.jsonl
Format: Newline-delimited JSON, one record per attempt/event
Scope: User-level (cross-project by workspaceId)
Persistence: Cross-session (survives process restarts)

Each record captures:

Loop state (status, lifecycleState, reason)
Budget consumption (costUsd, tokens, iterations used/remaining)
Verification results (verifier commands pass/fail)
Red-Blue probe results (tier, probe findings, decision)
Rollback boundary (commitSha for repo-backed runs)
Failure taxonomy classification

Context Distillation

Between retry attempts, MartinLoop performs context distillation: it carries a distilled summary of recent attempts and remaining constraints into the next attempt's context. This prevents runaway token growth across retries while preserving relevant failure context.

Purpose:

Prevent token budget exhaustion from growing attempt context
Preserve failure reasons for the next attempt
Reduce redundant recomputation

Repo-Backed Persistence

When a persistence store is configured (Pro/Enterprise tier), additional artifacts are stored:

Task contracts
Attempt ledgers
Diff artifacts
Rollback artifacts

Cross-Session Resumption

Runs can be inspected and resumed after process restart:

martin-loop inspect --run-id <loopId>
martin-loop resume --run-id <loopId>

Via MCP: martin_get_run → martin_resume

The JSONL record provides full context for resumption decisions.

Memory Scope

Scope	What's Stored	Location
User-global	Run records (all projects)	`~/.martin/runs/`
Project-local	Config (budget, verifier rules)	`martin.config.yaml`
Per-run	Attempt artifacts, diffs, rollback	Repo-backed store (Pro+)
In-memory	Current attempt context (distilled)	Runtime only

No Cross-Agent Memory

MartinLoop is a single-agent governance layer. There is no cross-agent memory mechanism — it governs one agent loop at a time. Multiple simultaneous governed loops would each have their own workspaceId in ~/.martin/runs/.

Orchestration

MartinLoop — Orchestration

Pattern

Not a multi-agent orchestrator. MartinLoop is a single-agent governance layer. It governs one AI coding loop at a time rather than coordinating multiple agents.

Orchestration pattern: none (governance-wrapper, not orchestrator)

Role Architecture

MartinLoop (governance control plane)
     │
     └── One Agent (Claude Code OR Codex OR direct-provider)
              └── Executes coding task in isolation

No Director/Worker/Steward. No task queue. No parallel fan-out. MartinLoop wraps a single agent's retry loop and governs it.

Multi-Model Usage

Yes — the adapter layer is pluggable. The operator chooses the agent engine:

# Use Claude Code
martin run "fix auth" --engine claude --budget 5

# Use Codex
martin run "fix auth" --engine codex --budget 5

Adapters: Claude CLI, Codex CLI, direct-provider, stub

Each adapter normalizes agent output into @martin/contracts types so the core runtime behaves identically.

Concurrency

MartinLoop does not manage concurrent agents. Each governed loop is independent. Concurrent loops require separate invocations with separate workspaceIds and separate JSONL record files.

Budget Governance (Primary Differentiator)

The core "orchestration" in MartinLoop is budget-aware loop control:

Condition	Action
`projected_cost > remaining_budget`	Reject attempt before execution
`spent >= maxUsd`	Exit loop, status = budget_exit
`iterations >= maxIterations`	Exit loop, status = iteration_exit
`tokens >= maxTokens`	Exit loop, status = token_exit
`softLimitUsd reached`	Emit warning, continue
`destructive action detected`	Apply destructiveActionPolicy (approval/block/allow)

Approval Gates

Two runtime gates:

Safety Leash Gate (pre-execution):
- Verifier command safety evaluation
- File scope validation (allowedPaths/deniedPaths)
- Context injection detection
- Secret-like values in task text
- Approval-boundary changes (deps, migrations)
Verifier Gate (post-execution):
- Run configured verifier commands (pnpm test, pnpm lint, etc.)
- Red-Blue adversarial probes (6 probes, baseline/high_risk/release_critical tiers)
- Block-severity finding → patch rejected

Optional third gate: 3. Destructive Action Gate: destructiveActionPolicy: approval requires operator approval before destructive changes proceed.

Audit Log

Yes — JSONL format.

Location: ~/.martin/runs/<workspaceId>.jsonl
Format: Append-only JSONL
Scope: Every attempt, with full lifecycle state, budget, verifier results, failure taxonomy
Replay: Runs can be inspected (martin_inspect) and resumed (martin_resume)

Prompt Chaining

Yes — context distillation carries distilled attempt summaries forward across retry iterations. This constitutes a form of prompt chaining: each retry receives a condensed context from prior attempts.

Ui Cli Surface

MartinLoop — UI and CLI Surface

CLI Binary: `martin-loop`

Detail	Value
Install	`npm install -g martin-loop`
Binary	`martin-loop` (also accessible as `martin`)
Type	Own CLI (TypeScript/Node.js 20+)
Config	`martin.config.yaml` (project-local)

Key Subcommands

martin-loop run "fix auth regression" \
  --budget 5 \
  --max-iterations 3 \
  --verify "pnpm test" \
  --allowed-paths "src/**,tests/**"

martin-loop demo          # Packaged demo
martin-loop inspect       # Inspect past run
martin-loop resume        # Resume previous run
martin-loop doctor        # Environment preflight
martin-loop preflight     # Pre-run policy check
martin-loop triage        # Rank runs by urgency

MCP Server: `@martinloop/mcp`

Detail	Value
Install	`npx -y @martinloop/mcp`
Version	0.2.5
Transport	stdio
Tools	11
Resources	12 (+ 3 templates)
Prompts	10
Schema	MCP 2025-12-11

MCP is the primary interface for agent-to-MartinLoop integration. Designed for Claude Code and Codex hosts.

No Web Dashboard (Free Tier)

MartinLoop Free (OSS) has no web dashboard. The CLI + MCP server is the full surface.

Pro tier ($49/month) and above are described as including a "control plane" dashboard (inferred from telemetryDestination: control-plane in config), but this is a SaaS feature, not local.

Run Inspection via CLI

martin-loop inspect --run-id ml-abc123

Output: Structured JSON showing lifecycle state, budget consumed, verifier results, proof card.

Run Records Location

All runs stored locally as JSONL:

~/.martin/runs/
  <workspaceId>.jsonl     # Append-only loop records

Integration with Claude Code

# Add MCP server (macOS/Linux)
claude mcp add --transport stdio --scope user martin-loop -- npx -y @martinloop/mcp

Integration with Codex

codex mcp add martin-loop -- npx -y @martinloop/mcp

AGENTS.md

Present at repo root. Governs agent behavior in the public OSS surface. Key rules:

Keep all public-facing materials client-facing and free of internal process language
Use trusted publishing for npm releases (no NPM_TOKEN)
Run pnpm oss:validate before any content changes

Related frameworks

same archetype · same primary tool · same memory type

OpenHarness ★ 13k

A11 Governance

Open-source Python agent runtime providing complete harness infrastructure: tools, memory, governance, swarm coordination, and…

Trae Agent ★ 12k

A11 Governance

Research-friendly open-source CLI coding agent by ByteDance, designed for academic ablation studies and modular LLM provider…

Sweep AI ★ 7.7k

A11 Governance

Autonomous GitHub bot that converts issues to pull requests using a sequential multi-agent pipeline.

Agent Governance Toolkit (microsoft) ★ 2.3k

A11 Governance

Enterprise-grade AI agent governance: YAML policy enforcement, 12-vector prompt injection defense, zero-trust identity,…

TDD Guard ★ 2.1k

A11 Governance

Mechanically enforces the Red-Green-Refactor TDD cycle by blocking file writes that violate TDD principles via a PreToolUse hook…

Agentic Coding Flywheel Setup (ACFS) ★ 1.5k

A11 Governance

Take a complete beginner from laptop to three AI coding agents running on a VPS in 30 minutes via an idempotent manifest-driven…

Distribution

Type: cli-tool
License: Apache-2.0
Install: simple
Version: 0.2.5 (@martinloop/mcp)

Surfaces

CLI binary: martin-loop
CLI subcmds: 14
Local UI: No
Tech stack: none

Components

Commands: 0
Skills: 0
Subagents: 0
Hooks: 0
MCP servers: 1
MCP tools: 11
Scripts: 0
Templates: 0

Workflow

Phases: 4
Approval gates: 2
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: No
Pattern: none
Max concurrent: 1
Isolation: none
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text

Execution

Mode: governed-loop
Crash recovery: Yes
Compaction: context-distillation
Session handoff: Yes
Streaming: No

Memory

Type: file-based
Persistence: user-global
Search: none
State files: 2 files

Quality

TDD: No
TDD mechanism: none
Validators: 3
Self-review: adversarial-probes

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: Yes
Audit format: jsonl
Replay: Yes

Tools

Primary: claude-code
Targets: 2
Portability: high

Signals

Stars: 22
Last commit: 2026-05-26
Contributors: 2
Maintainer: active
Quality score: 6/10

Summary

MartinLoop — Summary

What It Is

Core Value Proposition

Five-Layer Architecture

Key Capabilities

Packages

Overview

MartinLoop — Overview

Origin and Positioning

The Ralph Loop Concept

Freemium Model

Philosophy

11-Class Failure Taxonomy

Red-Blue Testing

Architecture

MartinLoop — Architecture

Distribution

Package Structure

Five-Layer Runtime

State Files

MCP Release Train

Adapter Pattern

Components

MartinLoop — Components

CLI Binary: martin-loop

CLI Subcommands (inferred from docs/source)

Key CLI Flags

MCP Server: @martinloop/mcp

MCP Tools (11)

MCP Resources (12)

MCP Prompts (10)

Config Schema: martin.config.yaml

SDK Interface

Integration: Claude Code

Integration: Codex

Prompts

MartinLoop — Prompts and Key Artifacts

MCP run_loop Tool Input Schema (verbatim from source)

MCP run_loop Tool Output Schema (verbatim from source)

Config Example (verbatim from martin.config.example.yaml)

MCP Preflight Tool Call (from quickstart doc)

MCP Doctor Tool Call (from quickstart doc)

MCP Recommended Allow-List (read-only)

MCP Prompts

JSONL Record Format

Uniqueness

MartinLoop — Uniqueness and Positioning

Differs from Seeds

Distinctive Features

Positioning

Observable Failure Modes

Workflow

MartinLoop — Workflow

Governed Agent Loop Flow

Budget Governance Loop

Phase Table

Approval Gates

JSONL Run Record Structure

Resuming Runs

Memory Context

MartinLoop — Memory and Context

Run Record Persistence

Context Distillation

Repo-Backed Persistence

Cross-Session Resumption

Memory Scope

No Cross-Agent Memory

Orchestration

MartinLoop — Orchestration

Pattern

Role Architecture

Multi-Model Usage

Concurrency

Budget Governance (Primary Differentiator)

Approval Gates

Audit Log

Prompt Chaining

Ui Cli Surface

MartinLoop — UI and CLI Surface

CLI Binary: `martin-loop`

MCP Server: `@martinloop/mcp`

Config Schema: `martin.config.yaml`

CLI Binary: `martin-loop`

MCP Server: `@martinloop/mcp`