Skip to content
/

MartinLoop

martinloop · Keesan12/Martin-Loop · ★ 22 · last commit 2026-05-26

Primitive shape 11 total
MCP tools 11
00

Summary

MartinLoop — Summary

Slug: martinloop
GitHub: https://github.com/Keesan12/Martin-Loop
Stars: 22
License: Apache-2.0
Language: TypeScript (Node.js 20+)
Status: Active (last commit 2026-05-26); NVIDIA Inception Program accepted

What It Is

MartinLoop is a governance control plane for AI coding agent loops. It wraps Claude Code, Codex, and custom agent runs with hard budget caps, policy checks, verifier gates, rollback evidence, and JSONL audit records.

It does not run code itself — it governs the "Ralph Loop" (the failure mode where an AI coding agent keeps retrying without a stop condition, burning budget with no audit trail).

Core Value Proposition

"AI coding accountability: completes good work, refuses unsafe work, stops uneconomical work."

Same task, same starting state: MartinLoop completes in one verified attempt at $2.30. Uncontrolled loop retries 4× at $5.20 with no audit trail (README benchmark).

Five-Layer Architecture

  1. Task Contract — Objective, verifier plan, repo root, allowed/denied paths, acceptance criteria, budget
  2. Policy & Budgetmartin.config.yaml + CLI flags; budget preflight rejects attempts before execution
  3. Agent Adapters — Claude CLI, Codex CLI, direct-provider, stub adapters normalize results
  4. Safety & Verification — Verifier commands, file scope, approval-boundary changes, secret scan, grounding
  5. Persistence — JSONL records at ~/.martin/runs/<workspaceId>.jsonl

Key Capabilities

  • Budget hard caps (maxUsd, softLimitUsd, maxIterations, maxTokens)
  • 11-class failure taxonomy (hallucination, test regression, scope creep, repo grounding failure, env mismatch, budget pressure, etc.)
  • Red-Blue adversarial testing (6 probes, 3 risk tiers)
  • Context injection detection (authority inversion, instruction override, identity redefinition)
  • Rollback evidence capture
  • Context distillation for subsequent attempts
  • JSONL run records (inspectable, resumable)

Packages

  • martin-loop — root npm package (CLI + SDK)
  • @martin/contracts — shared types
  • @martin/core — runtime, policy engine, safety leash
  • @martin/adapters — Claude CLI, Codex CLI, direct-provider, stub
  • @martin/cli — CLI implementation
  • @martinloop/mcp — MCP server (11 tools, 12 resources, 10 prompts)
01

Overview

MartinLoop — Overview

Origin and Positioning

MartinLoop was created by Keesan (keesan@martinloop.com) and accepted into the NVIDIA Inception Program. The README opens with a concrete failure story: "$2.40 estimated, $65 billed — 47 attempts, no hard stop, no rollback, no audit trail, nothing merged."

The project frames autonomous AI coding loops as having an unsolved failure mode — the "Ralph Loop" — and positions MartinLoop as the control layer that stops that failure before damage happens.

Website: https://martinloop.com

The Ralph Loop Concept

The "Ralph Loop" names the failure mode where an AI coding agent keeps retrying work without knowing when to stop. Symptoms:

  • No hard budget cap
  • No signed evidence layer
  • No pre-execution control system
  • Knows how to keep trying, does not know when continuing is unsafe, uneconomical, or impossible

MartinLoop governs this by:

  1. Stopping the next attempt before budget overspend
  2. Classifying unsafe/invalid actions before execution
  3. Appending a JSONL audit record for every attempt
  4. Rolling back failed runs instead of leaving broken state
  5. Reducing runaway token growth with context distillation

Freemium Model

Tier Price Key capability
Free (OSS) $0 Full CLI, MCP, local JSONL records
Pro $49/month Dashboard (inferred from README)
Growth $149/month Inferred
Enterprise $499/month Inferred

Philosophy

MartinLoop explicitly does NOT try to replace agent patterns — it makes those patterns safe to run. The distinction is important: it is a governance layer, not an agent platform.

Quote: "AI coding agents are useful. Unbounded retry loops are not."

11-Class Failure Taxonomy

MartinLoop classifies failures into 11 named classes:

  1. Hallucination
  2. Test regression
  3. Scope creep
  4. Repo grounding failure
  5. Environment mismatch
  6. Budget pressure
  7. Context poisoning (detected before admission)
  8. Authority inversion
  9. Instruction override
  10. Identity redefinition
  11. (Additional classes documented in codebase)

This taxonomy distinguishes real success from unsafe, invalid, or terminal behavior.

Red-Blue Testing

Six deterministic adversarial probes before a patch is accepted:

  1. Assertion deletion
  2. Silent reverts
  3. Context poisoning
  4. Budget self-reporting
  5. Grounding evasion
  6. (Additional probe)

Three risk tiers:

  • baseline — default
  • high_risk — stricter
  • release_critical — adds a Haiku model call

A single block-severity finding rejects the patch.

02

Architecture

MartinLoop — Architecture

Distribution

  • npm package: martin-loop (root facade)
  • MCP package: @martinloop/mcp@0.2.5 (standalone)
  • Install: npx martin-loop demo or npm install martin-loop

Package Structure

Martin-Loop/
├── packages/
│   ├── adapters/         # @martin/adapters — Claude CLI, Codex CLI, direct-provider, stub
│   ├── cli/              # @martin/cli — CLI implementation
│   │   └── src/
│   │       ├── bin/martin.ts
│   │       ├── mcp-config.ts
│   │       ├── persistence.ts
│   │       ├── run-store.ts
│   │       └── ux.ts
│   ├── contracts/        # @martin/contracts — shared types (loops, policy, governance, budget)
│   ├── core/             # @martin/core — runtime, policy engine, safety leash, grounding
│   └── mcp/              # @martinloop/mcp — MCP server
│       └── src/
│           ├── tools/    # 11+ tool handlers
│           ├── resources.ts
│           ├── prompts.ts
│           └── server.ts
├── martin.config.example.yaml
├── AGENTS.md
└── docs/
    └── oss/
        ├── QUICKSTART.md
        ├── RALPH-LOOP-SAFETY.md
        ├── MCP-FOR-AI-AGENTS.md
        └── CLAUDE-CODE-WALKTHROUGH.md

Five-Layer Runtime

[Task Contract Input]
        ↓
[Policy & Budget]  ← martin.config.yaml + CLI flags
  - maxUsd / softLimitUsd / maxIterations / maxTokens
  - destructiveActionPolicy
  - verifierRules
        ↓
[Safety Leash]  ← pre-execution checks
  - verifier command safety
  - file scope validation
  - secret-like values scan
  - context injection detection (authority inversion, instruction override)
        ↓
[Agent Adapter]  ← pluggable
  - Claude CLI adapter
  - Codex CLI adapter
  - Direct-provider adapter
  - Stub adapter (testing)
        ↓
[Verifier Gate]  ← post-execution checks
  - Run verifier commands (pnpm test, pnpm lint, etc.)
  - Red-Blue adversarial probes (6 probes, 3 tiers)
  - Failure taxonomy classification (11 classes)
        ↓
[Persistence]
  - JSONL records: ~/.martin/runs/<workspaceId>.jsonl
  - Rollback boundaries + restore outcomes
  - Context distillation for next attempt

State Files

  • ~/.martin/runs/<workspaceId>.jsonl — JSONL run records (append-only)
  • martin.config.yaml (project-local) — budget caps and verifier rules
  • martin.config.example.yaml — template with all configurable fields

MCP Release Train

Version Features
0.1.4 Operator foundation
0.2.0 Resources, resource templates, prompts, read-only cockpit inspection
0.2.5 Stable cockpit line; local triage; degraded run-store hardening

Adapter Pattern

Each adapter normalizes an agent's execution results into @martin/contracts types, allowing the core runtime to operate identically regardless of whether Claude Code, Codex, or a custom agent is running:

// Claude Code adapter
const loop = new MartinLoop({
  adapter: createClaudeCliAdapter({ workingDirectory: process.cwd() })
});

// Codex adapter
const loop = new MartinLoop({
  adapter: createCodexCliAdapter({ workingDirectory: process.cwd() })
});
03

Components

MartinLoop — Components

CLI Binary: martin-loop

Install: npm install -g martin-loop

CLI Subcommands (inferred from docs/source)

Subcommand Description
run Execute a governed agent loop
demo Run packaged demo locally
inspect Inspect a past run record
resume Resume a previous run
doctor Preflight environment checks
preflight Pre-execution policy check
triage Rank persisted runs needing attention
dossier Generate a run dossier
runs list List stored run records
runs get Get a specific run
runs attempt Get a specific attempt
runs verify Get verification results
mcp print-config Print MCP configuration
mcp install Install MCP server config

Key CLI Flags

  • --max-usd <n> — Hard USD budget cap
  • --soft-limit-usd <n> — Early warning threshold
  • --max-iterations <n> — Max retry iterations
  • --max-tokens <n> — Max token budget
  • --verify <cmd> — Override verifier commands
  • --policy-profile <name> — Policy profile (balanced, strict, etc.)
  • --allowed-paths <glob> — Restrict file scope
  • --denied-paths <glob> — Explicitly deny file access

MCP Server: @martinloop/mcp

Install: npx -y @martinloop/mcp

Transport: stdio

MCP Tools (11)

Tool Description
martin_doctor Environment preflight checks
martin_preflight Pre-run policy/budget check
martin_run Execute governed agent loop (only execution entrypoint)
martin_inspect Inspect loop state
martin_status Get current run status
martin_list_runs List persisted runs
martin_triage_runs Rank runs needing attention
martin_get_run Get specific run record
martin_get_attempt Get specific attempt
martin_get_verification_results Get verifier evidence
martin_run_dossier Generate compact run dossier

MCP Resources (12)

Resource URI Description
martin://server/health Server health
martin://runs/recent Recent runs
martin://runs/triage Runs needing attention
martin://runs/latest/summary Latest run summary
martin://runs/latest/proof-card Latest run proof card
martin://runs/latest/budget-status Budget status
martin://runs/latest/verifier-evidence Verifier evidence
martin://runs/latest/rollback-evidence Rollback evidence
martin://agent/next-step Recommended next step
martin://guides/mcp-usage Usage guide
martin://guides/publish-readiness Publish readiness guide
martin://runs/{loopId} Specific run (template)
martin://runs/{loopId}/attempts/{attemptIndex} Specific attempt (template)
martin://runs/{loopId}/verification Run verification (template)

MCP Prompts (10)

martin_start, martin_preflight, martin_triage, martin_resume, martin_prove, martin_release_check, martin_governed_coding_kickoff, martin_debug_failed_run, martin_publish_readiness_review, martin_triage_run_store

Config Schema: martin.config.yaml

policyProfile: balanced
budget:
  maxUsd: 8                   # Hard cost cap
  softLimitUsd: 5             # Early warning threshold
  maxIterations: 3            # Max loop iterations
  maxTokens: 20000            # Max token budget
governance:
  destructiveActionPolicy: approval   # approval | block | allow
  telemetryDestination: control-plane
  verifierRules:
    - pnpm test
    - pnpm lint

SDK Interface

import { MartinLoop, createClaudeCliAdapter } from 'martin-loop';

const loop = new MartinLoop({
  adapter: createClaudeCliAdapter({ workingDirectory: process.cwd() }),
  policy: {
    maxUsd: 5,
    maxIterations: 3,
    maxTokens: 20_000
  }
});

const result = await loop.run({
  task: {
    title: "Fix auth regression",
    objective: "Fix the failing auth regression tests",
    verificationPlan: ["pnpm test"],
    repoRoot: process.cwd()
  }
});

Integration: Claude Code

claude mcp add --transport stdio --scope user martin-loop -- npx -y @martinloop/mcp

Integration: Codex

codex mcp add martin-loop -- npx -y @martinloop/mcp
05

Prompts

MartinLoop — Prompts and Key Artifacts

MCP run_loop Tool Input Schema (verbatim from source)

export interface RunLoopInput {
  objective: string;
  workingDirectory?: string;
  engine?: "claude" | "codex";
  model?: string;
  maxUsd?: number;
  maxIterations?: number;
  maxTokens?: number;
  verificationPlan?: string[];
  allowedPaths?: string[];
  deniedPaths?: string[];
  workspaceId?: string;
  projectId?: string;
}

MCP run_loop Tool Output Schema (verbatim from source)

export interface RunLoopOutput {
  status: string;
  lifecycleState: string;
  reason: string;
  attempts: number;
  costUsd: number;
  verificationPassed: boolean;
  loopId: string;
  pressure: string;
  shouldStop: boolean;
  remainingBudgetUsd: number;
  remainingIterations: number;
  remainingTokens: number;
  engine: MartinEngine;
  workingDirectory: string;
  budget: LoopBudget;
  inspection: {
    runsRoot: string;
    runDirectory: string;
    loopRecordPath: string;
    ledgerPath: string;
    // ... loop preview, verification summary, artifacts
  };
}

Config Example (verbatim from martin.config.example.yaml)

policyProfile: balanced
budget:
  # Hard cost cap for a run.
  maxUsd: 8
  # Early warning threshold below the hard cap.
  softLimitUsd: 5
  # Maximum number of loop iterations.
  maxIterations: 3
  # Maximum token budget for the run.
  maxTokens: 20000
governance:
  # Policy used when the run would take a destructive action.
  destructiveActionPolicy: approval
  # Where governance/telemetry events should be sent.
  telemetryDestination: control-plane
  # Verification commands used when --verify is not supplied.
  verifierRules:
    - pnpm test
    - pnpm lint

MCP Preflight Tool Call (from quickstart doc)

{
  "tool": "martin_preflight",
  "arguments": {
    "objective": "Fix the auth regression and prove it with tests",
    "engine": "codex",
    "maxUsd": 3,
    "maxIterations": 3,
    "verificationPlan": ["pnpm test --filter auth"],
    "allowedPaths": ["src/**", "tests/**"],
    "deniedPaths": [".env*", "secrets/**"]
  }
}

MCP Doctor Tool Call (from quickstart doc)

{
  "tool": "martin_doctor",
  "arguments": {
    "engine": "codex"
  }
}
martin_doctor
martin_preflight
martin_list_runs
martin_triage_runs
martin_run_dossier

MCP Prompts

Named prompts for discovery-first workflows:

Prompt Use Case
martin_start Onboarding + first run setup
martin_preflight Pre-execution policy check narrative
martin_triage Rank runs needing attention
martin_resume Resume interrupted run
martin_prove Prove run completion to reviewer
martin_release_check Pre-release governance check
martin_governed_coding_kickoff Start governed coding session
martin_debug_failed_run Debug failed run with evidence
martin_publish_readiness_review npm/release readiness
martin_triage_run_store Triage full run store

JSONL Record Format

Append-only records at ~/.martin/runs/<workspaceId>.jsonl:

{
  "loopId": "ml-abc123",
  "status": "completed",
  "lifecycleState": "verified",
  "costUsd": 2.30,
  "attempts": 1,
  "verificationPassed": true,
  "failureTaxonomy": null,
  "remainingBudget": { "usd": 5.70, "iterations": 2 },
  "redBlueResults": { "tier": "baseline", "decision": "pass" },
  "rollbackBoundary": { "commitSha": "abc123def" },
  "timestamp": "2026-05-26T10:00:00Z"
}
09

Uniqueness

MartinLoop — Uniqueness and Positioning

Differs from Seeds

MartinLoop occupies a unique niche in this batch: it is not a multi-agent orchestrator. It is a governance wrapper for single-agent coding loops. This makes it architecturally different from every other framework in the batch.

Versus claude-flow (parallel multi-agent orchestration): claude-flow maximizes parallelism and throughput. MartinLoop maximizes trust, auditability, and cost control on a single agent run.

Versus bernstein (Python scheduler with HMAC audit): Bernstein orchestrates a pipeline of specialized agents (manager, architect, worker, QA) with deterministic scheduling and compliance-grade HMAC audit logs. MartinLoop governs a single agent loop with budget caps, but does not orchestrate multi-agent pipelines.

Versus taskmaster-ai (task decomposition and tracking): Taskmaster manages tasks but does not execute them or govern the execution loop's budget. MartinLoop governs execution but does not decompose tasks.

Versus stoneforge (Director/Worker/Steward hierarchy): Stoneforge is a full multi-agent workforce platform. MartinLoop is a governance control plane that wraps one agent at a time.

Distinctive Features

  1. Named failure taxonomy (11 classes) — the only framework that classifies why a loop failed into named categories. No other seed or batch framework has an explicit failure taxonomy.

  2. Red-Blue adversarial testing — 6 deterministic adversarial probes (assertion deletion, silent reverts, context poisoning, budget self-reporting, grounding evasion) that run before a patch is accepted. No other framework reviewed has adversarial probe suites.

  3. Budget-as-first-class-conceptmaxUsd, softLimitUsd, maxIterations, maxTokens are all configurable and enforced as hard stops. Cost governance at this granularity is not present in any seed framework.

  4. Context injection detection — scans for authority inversion, instruction override, identity redefinition in prompts before admission. Unique security posture.

  5. JSONL audit records — every attempt produces an append-only JSONL record with lifecycle state, budget consumption, failure classification, verifier evidence, and rollback boundary. The closest seed is Bernstein's HMAC-chained audit (more compliance-grade); MartinLoop's is simpler but more inspection-friendly.

  6. Ralph Loop naming — MartinLoop has done conceptual work to name the failure mode it solves. "Ralph Loop" gives practitioners a vocabulary for the ungoverned retry failure mode.

Positioning

MartinLoop targets individual developers and teams who run autonomous coding loops (Claude Code, Codex) unattended and have experienced uncontrolled retries burning budget. It is a safety net/governance add-on, not a replacement for agent frameworks.

The NVIDIA Inception Program acceptance suggests commercial ambitions beyond the OSS tier.

Observable Failure Modes

  1. Governance overhead — wrapping every agent run with preflight, safety leash, and verifier adds latency and setup complexity. Teams may skip governance for "simple" tasks and miss the protection.
  2. Single-agent only (OSS) — no multi-agent coordination. Teams wanting parallel execution must look elsewhere.
  3. No web UI (free tier) — CLI-only for OSS users; run inspection requires martin-loop inspect commands rather than a dashboard.
  4. 22 stars — small community; early stage; docs are good but ecosystem is minimal.
  5. Node.js 20+ required — excludes some environments.
04

Workflow

MartinLoop — Workflow

Governed Agent Loop Flow

User/Operator
     │
     ▼
[1. Doctor] ─── martin_doctor
     │            Environment check: Node version, claude/codex on PATH, config present
     │
     ▼
[2. Preflight] ── martin_preflight
     │            Check: budget feasibility, file scope safety, secret scan in task text
     │            OUTPUT: go / no-go + rationale
     │
     ▼
[3. Run] ───────── martin_run
     │            Task Contract submitted to core runtime
     │            Safety leash checks:
     │              - verifier command evaluation
     │              - context injection scan (authority inversion, instruction override)
     │              - file scope validation
     │              - dependency/migration change approval
     │
     ▼
[4. Agent Execution] ─── Adapter (Claude CLI / Codex CLI / direct-provider)
     │            Agent executes within allowed-paths scope
     │            Context distillation carries attempt summary forward
     │
     ▼
[5. Verifier Gate]
     │            Run verifier commands (e.g., pnpm test, pnpm lint)
     │            Red-Blue adversarial probes (6 deterministic probes)
     │              - Probe tiers: baseline / high_risk / release_critical
     │              - Block-severity finding → patch rejected
     │            Failure taxonomy classification (11 classes)
     │
     ├── PASS → run reaches 'completed' status
     │           JSONL record appended to ~/.martin/runs/<workspaceId>.jsonl
     │           Rollback boundary captured
     │
     └── FAIL → classify failure class
                 Retry if within budget (maxIterations, maxUsd, maxTokens)
                 If budget exhausted → rollback + exit
                 JSONL record appended with failure classification

Budget Governance Loop

Per attempt:
  1. Check remaining budget (maxUsd - spent)
  2. If projected_cost > remaining → reject attempt before execution
  3. If remaining == 0 OR iterations == maxIterations → exit loop
  4. softLimitUsd reached → emit warning but continue

Phase Table

Phase Action Gate Evidence Produced
Doctor Environment preflight No Doctor report
Preflight Policy/budget check No (advisory) go/no-go rationale
Task Contract Objective + verifier plan + budget No Stored contract
Safety Leash Pre-execution scan YES — blocks unsafe attempts Safety report
Agent Execution Adapter runs agent No File diffs, attempt artifacts
Verifier Gate Run verifiers + Red-Blue probes YES — blocks failed verification Verifier evidence
Completion Status = completed No JSONL record + proof card

Approval Gates

Two classes of gates:

  1. Safety Leash gate — runs before every agent attempt; blocks on injection patterns, file scope violations, secret-like values, destructive action policy
  2. Verifier gate — runs after every agent attempt; blocks on failing verifier commands or Red-Blue probe block-severity findings

destructiveActionPolicy: approval triggers a third gate requiring operator approval for destructive changes.

JSONL Run Record Structure

Each attempt appended to ~/.martin/runs/<workspaceId>.jsonl:

{
  "loopId": "<uuid>",
  "workspaceId": "<project-id>",
  "attempt": 1,
  "status": "completed|failed|rejected|rolled_back",
  "failureClass": "test_regression|hallucination|...",
  "budgetConsumed": { "usd": 2.30, "tokens": 14200, "iterations": 1 },
  "verifierResults": [...],
  "redBlueResults": { "tier": "baseline", "probes": [...], "decision": "pass" },
  "rollbackBoundary": { "commitSha": "..." },
  "timestamp": "2026-05-26T10:00:00Z"
}

Resuming Runs

martin-loop resume --run-id <loopId>

Or via MCP: martin_resume prompt or martin_get_run tool to inspect then martin_run with updated budget.

06

Memory Context

MartinLoop — Memory and Context

Run Record Persistence

Primary persistence: JSONL append-only log files.

  • Location: ~/.martin/runs/<workspaceId>.jsonl
  • Format: Newline-delimited JSON, one record per attempt/event
  • Scope: User-level (cross-project by workspaceId)
  • Persistence: Cross-session (survives process restarts)

Each record captures:

  • Loop state (status, lifecycleState, reason)
  • Budget consumption (costUsd, tokens, iterations used/remaining)
  • Verification results (verifier commands pass/fail)
  • Red-Blue probe results (tier, probe findings, decision)
  • Rollback boundary (commitSha for repo-backed runs)
  • Failure taxonomy classification

Context Distillation

Between retry attempts, MartinLoop performs context distillation: it carries a distilled summary of recent attempts and remaining constraints into the next attempt's context. This prevents runaway token growth across retries while preserving relevant failure context.

Purpose:

  • Prevent token budget exhaustion from growing attempt context
  • Preserve failure reasons for the next attempt
  • Reduce redundant recomputation

Repo-Backed Persistence

When a persistence store is configured (Pro/Enterprise tier), additional artifacts are stored:

  • Task contracts
  • Attempt ledgers
  • Diff artifacts
  • Rollback artifacts

Cross-Session Resumption

Runs can be inspected and resumed after process restart:

martin-loop inspect --run-id <loopId>
martin-loop resume --run-id <loopId>

Via MCP: martin_get_runmartin_resume

The JSONL record provides full context for resumption decisions.

Memory Scope

Scope What's Stored Location
User-global Run records (all projects) ~/.martin/runs/
Project-local Config (budget, verifier rules) martin.config.yaml
Per-run Attempt artifacts, diffs, rollback Repo-backed store (Pro+)
In-memory Current attempt context (distilled) Runtime only

No Cross-Agent Memory

MartinLoop is a single-agent governance layer. There is no cross-agent memory mechanism — it governs one agent loop at a time. Multiple simultaneous governed loops would each have their own workspaceId in ~/.martin/runs/.

07

Orchestration

MartinLoop — Orchestration

Pattern

Not a multi-agent orchestrator. MartinLoop is a single-agent governance layer. It governs one AI coding loop at a time rather than coordinating multiple agents.

Orchestration pattern: none (governance-wrapper, not orchestrator)

Role Architecture

MartinLoop (governance control plane)
     │
     └── One Agent (Claude Code OR Codex OR direct-provider)
              └── Executes coding task in isolation

No Director/Worker/Steward. No task queue. No parallel fan-out. MartinLoop wraps a single agent's retry loop and governs it.

Multi-Model Usage

Yes — the adapter layer is pluggable. The operator chooses the agent engine:

# Use Claude Code
martin run "fix auth" --engine claude --budget 5

# Use Codex
martin run "fix auth" --engine codex --budget 5

Adapters: Claude CLI, Codex CLI, direct-provider, stub

Each adapter normalizes agent output into @martin/contracts types so the core runtime behaves identically.

Concurrency

MartinLoop does not manage concurrent agents. Each governed loop is independent. Concurrent loops require separate invocations with separate workspaceIds and separate JSONL record files.

Budget Governance (Primary Differentiator)

The core "orchestration" in MartinLoop is budget-aware loop control:

Condition Action
projected_cost > remaining_budget Reject attempt before execution
spent >= maxUsd Exit loop, status = budget_exit
iterations >= maxIterations Exit loop, status = iteration_exit
tokens >= maxTokens Exit loop, status = token_exit
softLimitUsd reached Emit warning, continue
destructive action detected Apply destructiveActionPolicy (approval/block/allow)

Approval Gates

Two runtime gates:

  1. Safety Leash Gate (pre-execution):

    • Verifier command safety evaluation
    • File scope validation (allowedPaths/deniedPaths)
    • Context injection detection
    • Secret-like values in task text
    • Approval-boundary changes (deps, migrations)
  2. Verifier Gate (post-execution):

    • Run configured verifier commands (pnpm test, pnpm lint, etc.)
    • Red-Blue adversarial probes (6 probes, baseline/high_risk/release_critical tiers)
    • Block-severity finding → patch rejected

Optional third gate: 3. Destructive Action Gate: destructiveActionPolicy: approval requires operator approval before destructive changes proceed.

Audit Log

Yes — JSONL format.

  • Location: ~/.martin/runs/<workspaceId>.jsonl
  • Format: Append-only JSONL
  • Scope: Every attempt, with full lifecycle state, budget, verifier results, failure taxonomy
  • Replay: Runs can be inspected (martin_inspect) and resumed (martin_resume)

Prompt Chaining

Yes — context distillation carries distilled attempt summaries forward across retry iterations. This constitutes a form of prompt chaining: each retry receives a condensed context from prior attempts.

08

Ui Cli Surface

MartinLoop — UI and CLI Surface

CLI Binary: martin-loop

Detail Value
Install npm install -g martin-loop
Binary martin-loop (also accessible as martin)
Type Own CLI (TypeScript/Node.js 20+)
Config martin.config.yaml (project-local)

Key Subcommands

martin-loop run "fix auth regression" \
  --budget 5 \
  --max-iterations 3 \
  --verify "pnpm test" \
  --allowed-paths "src/**,tests/**"

martin-loop demo          # Packaged demo
martin-loop inspect       # Inspect past run
martin-loop resume        # Resume previous run
martin-loop doctor        # Environment preflight
martin-loop preflight     # Pre-run policy check
martin-loop triage        # Rank runs by urgency

MCP Server: @martinloop/mcp

Detail Value
Install npx -y @martinloop/mcp
Version 0.2.5
Transport stdio
Tools 11
Resources 12 (+ 3 templates)
Prompts 10
Schema MCP 2025-12-11

MCP is the primary interface for agent-to-MartinLoop integration. Designed for Claude Code and Codex hosts.

No Web Dashboard (Free Tier)

MartinLoop Free (OSS) has no web dashboard. The CLI + MCP server is the full surface.

Pro tier ($49/month) and above are described as including a "control plane" dashboard (inferred from telemetryDestination: control-plane in config), but this is a SaaS feature, not local.

Run Inspection via CLI

martin-loop inspect --run-id ml-abc123

Output: Structured JSON showing lifecycle state, budget consumed, verifier results, proof card.

Run Records Location

All runs stored locally as JSONL:

~/.martin/runs/
  <workspaceId>.jsonl     # Append-only loop records

Integration with Claude Code

# Add MCP server (macOS/Linux)
claude mcp add --transport stdio --scope user martin-loop -- npx -y @martinloop/mcp

Integration with Codex

codex mcp add martin-loop -- npx -y @martinloop/mcp

AGENTS.md

Present at repo root. Governs agent behavior in the public OSS surface. Key rules:

  • Keep all public-facing materials client-facing and free of internal process language
  • Use trusted publishing for npm releases (no NPM_TOKEN)
  • Run pnpm oss:validate before any content changes

Related frameworks

same archetype · same primary tool · same memory type

OpenHarness ★ 13k

Open-source Python agent runtime providing complete harness infrastructure: tools, memory, governance, swarm coordination, and…

Trae Agent ★ 12k

Research-friendly open-source CLI coding agent by ByteDance, designed for academic ablation studies and modular LLM provider…

Sweep AI ★ 7.7k

Autonomous GitHub bot that converts issues to pull requests using a sequential multi-agent pipeline.

Agent Governance Toolkit (microsoft) ★ 2.3k

Enterprise-grade AI agent governance: YAML policy enforcement, 12-vector prompt injection defense, zero-trust identity,…

TDD Guard ★ 2.1k

Mechanically enforces the Red-Green-Refactor TDD cycle by blocking file writes that violate TDD principles via a PreToolUse hook…

Agentic Coding Flywheel Setup (ACFS) ★ 1.5k

Take a complete beginner from laptop to three AI coding agents running on a VPS in 30 minutes via an idempotent manifest-driven…