Skip to content
/

hankweave

hankweave · SouthBridgeAI/hankweave-runtime · ★ 123 · last commit 2026-03-20

Production operations runtime for executing frozen, long-horizon agentic programs (hanks) reliably, with single-threaded execution, git checkpointing, and sentinel monitoring

Best whenParallel agentic systems are unmaintainable — single-threaded execution with codon boundaries is the only path to brownfield reliability
Skip ifParallel agentic execution, Interactive human-in-the-loop during runs
vs seeds
Unlike all 11 seeds (which augment interactive AI coding sessions), hankweave is a production operations runtime for running froze…
Primitive shape
No installable primitives
00

Summary

hankweave — Summary

hankweave is a JavaScript/TypeScript runtime (npm: hankweave, binary: hankweave) for long-horizon headless agent execution, developed at Southbridge AI. It orchestrates existing AI harnesses (Claude Code, Codex, Gemini CLI, Pi, OpenCode) rather than reimplementing an agent loop, letting those tools do the actual model calls while hankweave handles the execution scaffolding. Programs are called "hanks" — JSON configuration files defining sequences of "codons" (sealed agentic blocks) with "rigs" (deterministic setup scripts), "sentinels" (parallel real-time monitors), and "budgets" (cost/time/token limits). The runtime provides checkpointing + rollbacks at every codon boundary, a structured event journal tracing all tool calls, preflight validation, and a WebSocket event stream for external consumers. The single-agentic-thread constraint is explicit by design: "Much like time travel in stories, parallel systems make it incredibly hard to reason about behavior."

Compared to seeds, hankweave is unlike all 11 seeds. Seeds are developer-productivity harnesses; hankweave is a production operations runtime for running frozen, previously-developed agentic programs (hanks) reliably for hours or days — the "deploy" half of the development lifecycle, not the "build" half.

01

Overview

hankweave — Overview

Origin

Developed at Southbridge AI (SouthBridgeAI), used internally for production AI workloads: platform migrations, research compilation, codebook generation, planning. The FAQ states: "Claude Code is where you develop. Hankweave is where you ship."

Core Philosophy

From the README:

"Past a certain complexity — or task horizon — agentic systems become impossible to maintain and very hard to debug. The ultimate bottleneck isn't the model. It's the human being able to understand and reason about the behavior of an agent."

"Hankweave is not a coding agent. It lacks the interactivity and emergent flow-states where machine and minds fuse together. It trades some of the fun of developing something new to make repairing and maintaining systems easier. Hanks are harder to write, but far more reliable in execution, and orders of magnitude easier to debug."

"Hankweave is not a framework. It makes some opinionated choices to make longer and longer hanks easier to reason about and control, but the runtime remains highly configurable."

Opinionated Choices

  • Single agentic thread: one agent executing at any time — "Much like time travel in stories, parallel systems make it incredibly hard to reason about behavior."
  • Simple tools, used well: file edits, scripting, shell commands. No MCPs, no skill trees.
  • Non-interactive: no chat, no back-and-forth. Managed agentically or programmatically.

Brownfield Engineering

The philosophy is "brownfield AI engineering" — systems you can maintain, improve, and hand to someone else without "it works but you'll need me" attached. Hanks accumulate wisdom: edge cases become fixes, fixes become knowledge.

Unusual Terminology

The README explains: "We believe that the future consumers of hanks will be AI models that edit, modify, and reweave them. Distinct names reduce hallucinations from models assuming they know what something is without looking it up."

02

Architecture

hankweave — Architecture

Distribution

  • Type: npm package (hankweave on npm)
  • Version analyzed: 0.6.2
  • Branch: release/alpha (not main)
  • Install: bunx hankweave (no global install required); bun install -g hankweave
  • Binary name: hankweave
  • Required runtime: Bun (primary), Node.js compatibility
  • Target harnesses: Claude Code, Codex CLI, Gemini CLI, Pi, OpenCode (all external)

Repository Layout

server/                         # Main TypeScript runtime
├── archive-manifest.ts         # Archive/snapshot management
├── checkpoint-git.ts           # Git checkpointing at codon boundaries
├── claude-agent-sdk-manager.ts # Claude Code harness driver
├── codex-runtime-extractor.ts  # Codex CLI harness driver
├── codon-runner.ts             # Core: execute a single codon
├── event-journal.ts            # Structured event log
├── execution-planner.ts        # Hank execution planning
├── execution-thread.ts         # Single-threaded execution engine
├── hankweave-runtime.ts        # Main runtime entry
├── sentinels/                  # Sentinel monitoring system
├── state-manager.ts            # State machine management
├── storage/                    # Storage backends
├── telemetry/                  # Usage telemetry
├── wizard/                     # `hankweave init` wizard
├── prompt-builder.ts           # Prompt assembly from .md files
├── prompt-frontmatter.ts       # YAML frontmatter parser for prompts
└── llm/                        # LLM proxy (for sentinel calls)

schemas/
├── hank.schema.json            # JSON Schema for hank.json files
├── hankweave.schema.json       # Runtime config schema
└── sentinel.schema.json        # Sentinel definition schema

learning/
├── hank-basics.md              # Getting started guide
└── examples/                   # Annotated production hanks
    ├── clausetta/              # Auto-generates shims for changing harnesses
    └── plan-gen-v2-general/    # Production planning workflow

Config Files

  • hank.json — main workflow definition (JSON, validated against hank.schema.json)
  • Runtime config: API keys, model settings, data directory (separate from hank)
  • hankweave.schema.json — runtime configuration schema

Supported Harnesses

The runtime orchestrates external processes:

  • Claude Code (via claude-agent-sdk-manager.ts)
  • Codex CLI (via codex-runtime-extractor.ts)
  • Gemini CLI
  • Pi
  • OpenCode
  • Custom shims (via Clausetta example)
03

Components

hankweave — Components

Core Primitives

Hank (the program)

A hank.json file defining the complete agentic workflow:

{
  "$schema": "https://unpkg.com/hankweave@latest/schemas/hank.schema.json",
  "meta": { "name": "My Workflow", "version": "1.0.0" },
  "overrides": {
    "model": "sonnet",
    "budget": { "maxDollars": 10.0, "maxTimeSeconds": 3600 }
  },
  "codons": [...],
  "sentinels": [...]
}

Codon (agentic block)

Sealed unit of agentic work:

  • id, name — identifier
  • promptFile — path to markdown prompt file (with YAML frontmatter)
  • model — model override for this codon
  • continuationMode"fresh" (new context) or "continue" (append to previous)
  • checkpointedFiles — glob patterns for files tracked by this codon

Rig (deterministic setup)

Deterministic code that runs before a codon to prepare its environment. Shell scripts or TypeScript.

Sentinel (real-time monitor)

Runs in parallel to the main agent thread, tapping the event stream:

  • Trigger: event pattern or LLM condition
  • Action: deterministic code, LLM evaluation, or both
  • Use cases: guardrails, cost tracking, drift detection, live documentation

Budget

Enforced independently per codon + globally per hank:

  • maxDollars — cost budget in USD
  • maxTimeSeconds — wall-clock time limit
  • allocation"shared", "proportional", "proportional-strict"

CLI Subcommands

Command Purpose
hankweave Interactive wizard to set up and run a hank
hankweave init Scaffold a hank in the current folder
hankweave run <hank> Execute a hank
hankweave replay <checkpoint> Replay from a previous checkpoint

Event Journal

Structured log of all tool calls, file writes, and decisions. Used for debugging 18+ hour runs. WebSocket stream for external consumers (CI systems, custom UIs, data pipelines).

Checkpointing

Git snapshots at every codon boundary. Rollback to any previous codon: hankweave replay.

No Skills, No Hooks, No Slash Commands

hankweave ships no Claude Code hooks, no .claude/ files, and no slash-command markdown. It is a standalone execution runtime.

05

Prompts

hankweave — Prompts

Excerpt 1: hank.schema.json — Codon definition (verbatim excerpt)

{
  "id": "build-schema",
  "name": "Build Zod Schemas",
  "promptFile": "./prompts/schema-builder.md",
  "model": "sonnet",
  "continuationMode": "fresh",
  "checkpointedFiles": ["src/schema/**/*.ts"]
}

Prompting technique: File-based prompts — each codon references a Markdown file. The promptFile separates the prompt from the hank definition, making prompts self-documenting and version-controllable independently of the workflow structure.

Excerpt 2: hank-basics.md — Codon philosophy (verbatim)

From the learning docs:

A codon is a single block - a prompt, a model, and the files it should track.

┌─────────────────────────────────────────────────────┐
│  CODON: build-schema                                │
├─────────────────────────────────────────────────────┤
│                                                     │
│  PROMPT                                             │
│  "Read the CSV files in data/ and create            │
│   strict Zod schemas in src/schema/"                │
│                                                     │
│  MODEL: claude-sonnet                               │
│  TRACKS: ["src/schema/**/*.ts"]                     │
│                                                     │
└─────────────────────────────────────────────────────┘

Because codons run through standard agent harnesses, developing them is straightforward: get something working in Claude Code or Codex, then capture that working state into a codon.

Prompting technique: Frozen prompt capture — prompts are developed interactively in a coding agent, then frozen into Markdown files referenced by codons. The prompt itself can include YAML frontmatter for metadata. Template variables are supported in prompts.

Prompting Architecture

  • Prompts are Markdown files with YAML frontmatter (prompt-frontmatter.ts parser)
  • prompt-builder.ts assembles prompts with variable substitution
  • Prompts are version-controlled alongside the hank definition
  • Sentinels can also use LLM calls (via llm-proxy.ts) for real-time evaluation prompts
  • No slash commands or static system prompts for the harness itself
09

Uniqueness

hankweave — Uniqueness

Differs from Seeds

hankweave is unlike all 11 seeds. Seeds are developer-productivity harnesses for interactive AI coding sessions. hankweave is a production operations runtime for executing frozen agentic programs reliably. The closest analogy is claude-flow in providing a runtime with checkpointing, but claude-flow augments the agent loop; hankweave replaces the developer's interaction with a frozen, version-controlled execution plan. The single-agentic-thread constraint is the clearest differentiator from every other framework in this batch and all seeds: while others add parallelism, hankweave explicitly removes it for maintainability reasons.

Distinctive Position

  1. Only framework that explicitly prohibits parallel agents — single agentic thread as a first-class design constraint
  2. Only framework that orchestrates OTHER agent harnesses rather than implementing its own — Claude Code, Codex, Gemini CLI are subprocess drivers
  3. Git checkpointing at every codon boundary — not just session checkpointing; full git snapshot for rollback
  4. Sentinels — the most sophisticated monitoring primitive in this batch: parallel LLM-powered observers that tap the event stream without interrupting execution
  5. Brownfield engineering philosophy — explicitly targets the "maintain and hand to someone else" problem rather than the "build new features" problem
  6. CCEPL-driven development — a named methodology for freezing interactive work into reproducible hanks
  7. Clausetta — a production hank that auto-generates harness shims when AI tool APIs change, using hankweave to maintain itself

Explicit Anti-Patterns

From the README:

  • Parallel agentic systems ("make it incredibly hard to reason about behavior")
  • Interactive chat (non-interactive by design)
  • MCPs, skill trees, "latest cool thing" (simple tools only)
  • Using hankweave for "greenfield ease" when brownfield maintainability isn't needed

Observable Failure Modes

  • release/alpha branch as default: not production-stable semver
  • Stars (123) suggest niche/early audience
  • Bun dependency: limits environments where Bun isn't available
  • No human-in-the-loop: hanks must be self-healing via sentinels + retry logic or fail entirely
  • Harness-specific runtime extractors: adding a new harness requires writing a new extractor

Inspired By

"Antibrittle Agents" blog post by Southbridge, CCEPL-driven development methodology.

04

Workflow

hankweave — Workflow

Development → Deployment Lifecycle

  1. Develop interactively — use Claude Code, Codex, or Gemini CLI to build and test the agentic behavior
  2. Freeze into a codon — capture the working prompt and file patterns into hank.json
  3. Run via hankweavehankweave run my-hank.json for reliable, inspectable execution
  4. Debug with event journal — inspect which tool call or decision failed
  5. Rollback and repairhankweave replay <checkpoint> to resume from any codon boundary
  6. Accumulate wisdom — edge cases become codon-level fixes that stay in version control

CCEPL-Driven Development

From the README: "CCEPL-driven development explains how hanks get built — from coding agent to frozen codon." The workflow: Code → Capture → Execute → Persist → Learn.

Execution Phases

Phase Description Artifact
Preflight Validate API keys, model availability, file paths, rig configs, sentinel schemas Error list or OK
Budget allocation Distribute budget across codons Per-codon budget limits
Codon N execution Spawn harness (Claude Code/Codex/etc.) with prompt, wait for completion Tool call events, file changes
Sentinel monitoring Parallel event stream monitoring during each codon Observations, interventions
Checkpoint Git snapshot of checkpointedFiles Git commit
Next codon Pass context + results to next codon (or end) Next codon input

Loop / Retry Pattern

Agentic Dynamic Programming: sequence multiple codons that repeat similar tasks, trading compute for reliability. Loops are defined in the hank schema.

Approval Gates

None. hankweave is fully non-interactive. "No chat, no back-and-forth. Hankweave is designed to be managed agentically or programmatically through the socket protocol."

06

Memory Context

hankweave — Memory and Context

State Between Codons

  • Each codon runs in a fresh or continuing context (continuationMode: "fresh" | "continue")
  • "fresh": new context window — no bleed from previous codon
  • "continue": context from previous codon is available
  • checkpointedFiles tracks which files are the output of each codon

Git Checkpointing

  • At every codon boundary, hankweave creates a git snapshot of checkpointedFiles
  • This provides a complete, navigable history of all file states throughout the run
  • hankweave replay <checkpoint> can restore to any codon boundary

Event Journal

  • event-journal.ts — structured log of every tool call, file write, and decision
  • Stored as a structured log file during the run
  • Used for post-run debugging of long executions (18+ hour runs documented)
  • WebSocket stream: external consumers can tap the event stream in real time

Data (read-only mount)

  • Input data files are mounted read-only to the hank
  • The agent operates on the data via its harness tools; hankweave enforces the read-only contract
  • Output goes to the hankweave workspace

Context Boundaries

Codons act as context circuit breakers:

  • Problems in codon 3 don't leak into codon 7
  • Each codon starts fresh or with a controlled continuation
  • Complexity grows linearly, then plateaus (vs. exponential without structure)

Budget State

  • Budget tracker (cost-tracker.ts) maintains real-time spend per codon and globally
  • Budget exhaustion triggers orderly termination or sentinel intervention
07

Orchestration

hankweave — Orchestration

Multi-Agent Pattern

None — single agentic thread by explicit design.

"Much like time travel in stories, parallel systems make it incredibly hard to reason about behavior. There is only ever one agent executing at any given time."

This is the defining constraint. No parallel agents, no sub-agent spawning.

Orchestration Pattern

sequential — codons run in order. Looping (repeat a codon sequence) is supported but still single-threaded.

Harness Multiplexing

hankweave can use different harnesses for different codons:

  • Codon 1: Claude Code (targeted work)
  • Codon 2: Codex (planning)
  • Codon 3: Gemini (writing/specifications)

From the README: "Need high-context understanding AND high-reasoning? Mix and match harnesses."

This is the primary multi-model mechanism: different harnesses (which use different underlying models) for different codons, specified per-codon in the hank definition.

Sentinel Parallelism

Sentinels run in parallel to the main agent thread as observers:

  • They do NOT spawn additional agents
  • They tap the event stream without interrupting the main thread
  • They can trigger deterministic actions or LLM evaluations
  • They function as real-time monitors, not orchestrators

Execution Mode

background-daemon — hankweave is designed for long-running headless execution (minutes to 18+ hours). The WebSocket event stream enables CI systems and custom UIs to monitor without interaction.

Isolation Mechanism

Git worktrees are implied by the checkpointing system, but not documented as explicit per-feature isolation. The workspace isolation is codon-level (fresh context + git snapshot) rather than container-level.

No Human-in-the-Loop

Fully non-interactive. Any human interaction must happen between hank runs, not during.

08

Ui Cli Surface

hankweave — UI/CLI Surface

CLI Binary: hankweave

  • Binary name: hankweave (from dist/index.js)
  • Install: bunx hankweave (no global install) or bun install -g hankweave
  • Is thin wrapper: No — it is the full runtime
  • Subcommands:
    • hankweave — interactive wizard (setup + run)
    • hankweave init — scaffold a hank in the current folder
    • hankweave run <hank> — execute a hank
    • hankweave replay <checkpoint> — resume from a checkpoint
    • server/index.ts --headless — headless server mode

Terminal Dashboard (Development Only)

A simple bundled TUI (basic-tui.ts) exists for development-time monitoring ("watching your hank while you're building it, not while it's in production"). This is explicitly NOT for production use.

WebSocket Event Stream

Production monitoring happens via WebSocket:

  • All tool calls and decisions are emitted as events
  • External consumers: CI systems, data pipelines, custom UIs
  • Replay capability: the event log can reconstruct the execution history

IDE Integration

None. hankweave has no .claude/ files, no Cursor integration.

Clausetta: Auto-Generated Shims

A production hank (learning/examples/clausetta/) auto-generates shims when underlying harnesses change their APIs. This is hankweave used to maintain itself — meta-automation.

Cross-Tool Portability

medium — hankweave targets specific harnesses (Claude Code, Codex, Gemini, Pi, OpenCode) by name. Adding a new harness requires a new runtime extractor or shim.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.