hankweave

hankweave · SouthBridgeAI/hankweave-runtime · ★ 123 · last commit 2026-03-20

Production operations runtime for executing frozen, long-horizon agentic programs (hanks) reliably, with single-threaded execution, git checkpointing, and sentinel monitoring

Best whenParallel agentic systems are unmaintainable — single-threaded execution with codon boundaries is the only path to brownfield reliability

Skip ifParallel agentic execution, Interactive human-in-the-loop during runs

vs seeds

Unlike all 11 seeds (which augment interactive AI coding sessions), hankweave is a production operations runtime for running froze…

Primitive shape

No installable primitives

Summary

hankweave — Summary

hankweave is a JavaScript/TypeScript runtime (npm: hankweave, binary: hankweave) for long-horizon headless agent execution, developed at Southbridge AI. It orchestrates existing AI harnesses (Claude Code, Codex, Gemini CLI, Pi, OpenCode) rather than reimplementing an agent loop, letting those tools do the actual model calls while hankweave handles the execution scaffolding. Programs are called "hanks" — JSON configuration files defining sequences of "codons" (sealed agentic blocks) with "rigs" (deterministic setup scripts), "sentinels" (parallel real-time monitors), and "budgets" (cost/time/token limits). The runtime provides checkpointing + rollbacks at every codon boundary, a structured event journal tracing all tool calls, preflight validation, and a WebSocket event stream for external consumers. The single-agentic-thread constraint is explicit by design: "Much like time travel in stories, parallel systems make it incredibly hard to reason about behavior."

Compared to seeds, hankweave is unlike all 11 seeds. Seeds are developer-productivity harnesses; hankweave is a production operations runtime for running frozen, previously-developed agentic programs (hanks) reliably for hours or days — the "deploy" half of the development lifecycle, not the "build" half.

Overview

hankweave — Overview

Origin

Developed at Southbridge AI (SouthBridgeAI), used internally for production AI workloads: platform migrations, research compilation, codebook generation, planning. The FAQ states: "Claude Code is where you develop. Hankweave is where you ship."

Core Philosophy

From the README:

"Past a certain complexity — or task horizon — agentic systems become impossible to maintain and very hard to debug. The ultimate bottleneck isn't the model. It's the human being able to understand and reason about the behavior of an agent."

"Hankweave is not a coding agent. It lacks the interactivity and emergent flow-states where machine and minds fuse together. It trades some of the fun of developing something new to make repairing and maintaining systems easier. Hanks are harder to write, but far more reliable in execution, and orders of magnitude easier to debug."

"Hankweave is not a framework. It makes some opinionated choices to make longer and longer hanks easier to reason about and control, but the runtime remains highly configurable."

Opinionated Choices

Single agentic thread: one agent executing at any time — "Much like time travel in stories, parallel systems make it incredibly hard to reason about behavior."
Simple tools, used well: file edits, scripting, shell commands. No MCPs, no skill trees.
Non-interactive: no chat, no back-and-forth. Managed agentically or programmatically.

Brownfield Engineering

The philosophy is "brownfield AI engineering" — systems you can maintain, improve, and hand to someone else without "it works but you'll need me" attached. Hanks accumulate wisdom: edge cases become fixes, fixes become knowledge.

Unusual Terminology

The README explains: "We believe that the future consumers of hanks will be AI models that edit, modify, and reweave them. Distinct names reduce hallucinations from models assuming they know what something is without looking it up."

Architecture

hankweave — Architecture

Distribution

Type: npm package (hankweave on npm)
Version analyzed: 0.6.2
Branch: release/alpha (not main)
Install: bunx hankweave (no global install required); bun install -g hankweave
Binary name: hankweave
Required runtime: Bun (primary), Node.js compatibility
Target harnesses: Claude Code, Codex CLI, Gemini CLI, Pi, OpenCode (all external)

Repository Layout

server/                         # Main TypeScript runtime
├── archive-manifest.ts         # Archive/snapshot management
├── checkpoint-git.ts           # Git checkpointing at codon boundaries
├── claude-agent-sdk-manager.ts # Claude Code harness driver
├── codex-runtime-extractor.ts  # Codex CLI harness driver
├── codon-runner.ts             # Core: execute a single codon
├── event-journal.ts            # Structured event log
├── execution-planner.ts        # Hank execution planning
├── execution-thread.ts         # Single-threaded execution engine
├── hankweave-runtime.ts        # Main runtime entry
├── sentinels/                  # Sentinel monitoring system
├── state-manager.ts            # State machine management
├── storage/                    # Storage backends
├── telemetry/                  # Usage telemetry
├── wizard/                     # `hankweave init` wizard
├── prompt-builder.ts           # Prompt assembly from .md files
├── prompt-frontmatter.ts       # YAML frontmatter parser for prompts
└── llm/                        # LLM proxy (for sentinel calls)

schemas/
├── hank.schema.json            # JSON Schema for hank.json files
├── hankweave.schema.json       # Runtime config schema
└── sentinel.schema.json        # Sentinel definition schema

learning/
├── hank-basics.md              # Getting started guide
└── examples/                   # Annotated production hanks
    ├── clausetta/              # Auto-generates shims for changing harnesses
    └── plan-gen-v2-general/    # Production planning workflow

Config Files

hank.json — main workflow definition (JSON, validated against hank.schema.json)
Runtime config: API keys, model settings, data directory (separate from hank)
hankweave.schema.json — runtime configuration schema

Supported Harnesses

The runtime orchestrates external processes:

Claude Code (via claude-agent-sdk-manager.ts)
Codex CLI (via codex-runtime-extractor.ts)
Gemini CLI
Pi
OpenCode
Custom shims (via Clausetta example)

Components

hankweave — Components

Core Primitives

Hank (the program)

A hank.json file defining the complete agentic workflow:

{
  "$schema": "https://unpkg.com/hankweave@latest/schemas/hank.schema.json",
  "meta": { "name": "My Workflow", "version": "1.0.0" },
  "overrides": {
    "model": "sonnet",
    "budget": { "maxDollars": 10.0, "maxTimeSeconds": 3600 }
  },
  "codons": [...],
  "sentinels": [...]
}

Codon (agentic block)

Sealed unit of agentic work:

id, name — identifier
promptFile — path to markdown prompt file (with YAML frontmatter)
model — model override for this codon
continuationMode — "fresh" (new context) or "continue" (append to previous)
checkpointedFiles — glob patterns for files tracked by this codon

Rig (deterministic setup)

Deterministic code that runs before a codon to prepare its environment. Shell scripts or TypeScript.

Sentinel (real-time monitor)

Runs in parallel to the main agent thread, tapping the event stream:

Trigger: event pattern or LLM condition
Action: deterministic code, LLM evaluation, or both
Use cases: guardrails, cost tracking, drift detection, live documentation

Budget

Enforced independently per codon + globally per hank:

maxDollars — cost budget in USD
maxTimeSeconds — wall-clock time limit
allocation — "shared", "proportional", "proportional-strict"

CLI Subcommands

Command	Purpose
`hankweave`	Interactive wizard to set up and run a hank
`hankweave init`	Scaffold a hank in the current folder
`hankweave run <hank>`	Execute a hank
`hankweave replay <checkpoint>`	Replay from a previous checkpoint

Event Journal

Structured log of all tool calls, file writes, and decisions. Used for debugging 18+ hour runs. WebSocket stream for external consumers (CI systems, custom UIs, data pipelines).

Checkpointing

Git snapshots at every codon boundary. Rollback to any previous codon: hankweave replay.

No Skills, No Hooks, No Slash Commands

hankweave ships no Claude Code hooks, no .claude/ files, and no slash-command markdown. It is a standalone execution runtime.

Prompts

hankweave — Prompts

Excerpt 1: hank.schema.json — Codon definition (verbatim excerpt)

{
  "id": "build-schema",
  "name": "Build Zod Schemas",
  "promptFile": "./prompts/schema-builder.md",
  "model": "sonnet",
  "continuationMode": "fresh",
  "checkpointedFiles": ["src/schema/**/*.ts"]
}

Prompting technique: File-based prompts — each codon references a Markdown file. The promptFile separates the prompt from the hank definition, making prompts self-documenting and version-controllable independently of the workflow structure.

Excerpt 2: hank-basics.md — Codon philosophy (verbatim)

From the learning docs:

A codon is a single block - a prompt, a model, and the files it should track.

┌─────────────────────────────────────────────────────┐
│  CODON: build-schema                                │
├─────────────────────────────────────────────────────┤
│                                                     │
│  PROMPT                                             │
│  "Read the CSV files in data/ and create            │
│   strict Zod schemas in src/schema/"                │
│                                                     │
│  MODEL: claude-sonnet                               │
│  TRACKS: ["src/schema/**/*.ts"]                     │
│                                                     │
└─────────────────────────────────────────────────────┘

Because codons run through standard agent harnesses, developing them is straightforward: get something working in Claude Code or Codex, then capture that working state into a codon.

Prompting technique: Frozen prompt capture — prompts are developed interactively in a coding agent, then frozen into Markdown files referenced by codons. The prompt itself can include YAML frontmatter for metadata. Template variables are supported in prompts.

Prompting Architecture

Prompts are Markdown files with YAML frontmatter (prompt-frontmatter.ts parser)
prompt-builder.ts assembles prompts with variable substitution
Prompts are version-controlled alongside the hank definition
Sentinels can also use LLM calls (via llm-proxy.ts) for real-time evaluation prompts
No slash commands or static system prompts for the harness itself

Uniqueness

hankweave — Uniqueness

Differs from Seeds

hankweave is unlike all 11 seeds. Seeds are developer-productivity harnesses for interactive AI coding sessions. hankweave is a production operations runtime for executing frozen agentic programs reliably. The closest analogy is claude-flow in providing a runtime with checkpointing, but claude-flow augments the agent loop; hankweave replaces the developer's interaction with a frozen, version-controlled execution plan. The single-agentic-thread constraint is the clearest differentiator from every other framework in this batch and all seeds: while others add parallelism, hankweave explicitly removes it for maintainability reasons.

Distinctive Position

Only framework that explicitly prohibits parallel agents — single agentic thread as a first-class design constraint
Only framework that orchestrates OTHER agent harnesses rather than implementing its own — Claude Code, Codex, Gemini CLI are subprocess drivers
Git checkpointing at every codon boundary — not just session checkpointing; full git snapshot for rollback
Sentinels — the most sophisticated monitoring primitive in this batch: parallel LLM-powered observers that tap the event stream without interrupting execution
Brownfield engineering philosophy — explicitly targets the "maintain and hand to someone else" problem rather than the "build new features" problem
CCEPL-driven development — a named methodology for freezing interactive work into reproducible hanks
Clausetta — a production hank that auto-generates harness shims when AI tool APIs change, using hankweave to maintain itself

Explicit Anti-Patterns

From the README:

Parallel agentic systems ("make it incredibly hard to reason about behavior")
Interactive chat (non-interactive by design)
MCPs, skill trees, "latest cool thing" (simple tools only)
Using hankweave for "greenfield ease" when brownfield maintainability isn't needed

Observable Failure Modes

release/alpha branch as default: not production-stable semver
Stars (123) suggest niche/early audience
Bun dependency: limits environments where Bun isn't available
No human-in-the-loop: hanks must be self-healing via sentinels + retry logic or fail entirely
Harness-specific runtime extractors: adding a new harness requires writing a new extractor

Inspired By

"Antibrittle Agents" blog post by Southbridge, CCEPL-driven development methodology.

Workflow

hankweave — Workflow

Development → Deployment Lifecycle

Develop interactively — use Claude Code, Codex, or Gemini CLI to build and test the agentic behavior
Freeze into a codon — capture the working prompt and file patterns into hank.json
Run via hankweave — hankweave run my-hank.json for reliable, inspectable execution
Debug with event journal — inspect which tool call or decision failed
Rollback and repair — hankweave replay <checkpoint> to resume from any codon boundary
Accumulate wisdom — edge cases become codon-level fixes that stay in version control

CCEPL-Driven Development

From the README: "CCEPL-driven development explains how hanks get built — from coding agent to frozen codon." The workflow: Code → Capture → Execute → Persist → Learn.

Execution Phases

Phase	Description	Artifact
Preflight	Validate API keys, model availability, file paths, rig configs, sentinel schemas	Error list or OK
Budget allocation	Distribute budget across codons	Per-codon budget limits
Codon N execution	Spawn harness (Claude Code/Codex/etc.) with prompt, wait for completion	Tool call events, file changes
Sentinel monitoring	Parallel event stream monitoring during each codon	Observations, interventions
Checkpoint	Git snapshot of `checkpointedFiles`	Git commit
Next codon	Pass context + results to next codon (or end)	Next codon input

Loop / Retry Pattern

Agentic Dynamic Programming: sequence multiple codons that repeat similar tasks, trading compute for reliability. Loops are defined in the hank schema.

Approval Gates

None. hankweave is fully non-interactive. "No chat, no back-and-forth. Hankweave is designed to be managed agentically or programmatically through the socket protocol."

Memory Context

hankweave — Memory and Context

State Between Codons

Each codon runs in a fresh or continuing context (continuationMode: "fresh" | "continue")
"fresh": new context window — no bleed from previous codon
"continue": context from previous codon is available
checkpointedFiles tracks which files are the output of each codon

Git Checkpointing

At every codon boundary, hankweave creates a git snapshot of checkpointedFiles
This provides a complete, navigable history of all file states throughout the run
hankweave replay <checkpoint> can restore to any codon boundary

Event Journal

event-journal.ts — structured log of every tool call, file write, and decision
Stored as a structured log file during the run
Used for post-run debugging of long executions (18+ hour runs documented)
WebSocket stream: external consumers can tap the event stream in real time

Data (read-only mount)

Input data files are mounted read-only to the hank
The agent operates on the data via its harness tools; hankweave enforces the read-only contract
Output goes to the hankweave workspace

Context Boundaries

Codons act as context circuit breakers:

Problems in codon 3 don't leak into codon 7
Each codon starts fresh or with a controlled continuation
Complexity grows linearly, then plateaus (vs. exponential without structure)

Budget State

Budget tracker (cost-tracker.ts) maintains real-time spend per codon and globally
Budget exhaustion triggers orderly termination or sentinel intervention

Orchestration

hankweave — Orchestration

Multi-Agent Pattern

None — single agentic thread by explicit design.

"Much like time travel in stories, parallel systems make it incredibly hard to reason about behavior. There is only ever one agent executing at any given time."

This is the defining constraint. No parallel agents, no sub-agent spawning.

Orchestration Pattern

sequential — codons run in order. Looping (repeat a codon sequence) is supported but still single-threaded.

Harness Multiplexing

hankweave can use different harnesses for different codons:

Codon 1: Claude Code (targeted work)
Codon 2: Codex (planning)
Codon 3: Gemini (writing/specifications)

From the README: "Need high-context understanding AND high-reasoning? Mix and match harnesses."

This is the primary multi-model mechanism: different harnesses (which use different underlying models) for different codons, specified per-codon in the hank definition.

Sentinel Parallelism

Sentinels run in parallel to the main agent thread as observers:

They do NOT spawn additional agents
They tap the event stream without interrupting the main thread
They can trigger deterministic actions or LLM evaluations
They function as real-time monitors, not orchestrators

Execution Mode

background-daemon — hankweave is designed for long-running headless execution (minutes to 18+ hours). The WebSocket event stream enables CI systems and custom UIs to monitor without interaction.

Isolation Mechanism

Git worktrees are implied by the checkpointing system, but not documented as explicit per-feature isolation. The workspace isolation is codon-level (fresh context + git snapshot) rather than container-level.

No Human-in-the-Loop

Fully non-interactive. Any human interaction must happen between hank runs, not during.

Ui Cli Surface

hankweave — UI/CLI Surface

CLI Binary: `hankweave`

Binary name: hankweave (from dist/index.js)
Install: bunx hankweave (no global install) or bun install -g hankweave
Is thin wrapper: No — it is the full runtime
Subcommands:
- hankweave — interactive wizard (setup + run)
- hankweave init — scaffold a hank in the current folder
- hankweave run <hank> — execute a hank
- hankweave replay <checkpoint> — resume from a checkpoint
- server/index.ts --headless — headless server mode

Terminal Dashboard (Development Only)

A simple bundled TUI (basic-tui.ts) exists for development-time monitoring ("watching your hank while you're building it, not while it's in production"). This is explicitly NOT for production use.

WebSocket Event Stream

Production monitoring happens via WebSocket:

All tool calls and decisions are emitted as events
External consumers: CI systems, data pipelines, custom UIs
Replay capability: the event log can reconstruct the execution history

IDE Integration

None. hankweave has no .claude/ files, no Cursor integration.

Clausetta: Auto-Generated Shims

A production hank (learning/examples/clausetta/) auto-generates shims when underlying harnesses change their APIs. This is hankweave used to maintain itself — meta-automation.

Cross-Tool Portability

medium — hankweave targets specific harnesses (Claude Code, Codex, Gemini, Pi, OpenCode) by name. Adding a new harness requires a new runtime extractor or shim.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

A8 Cross-runtime harness

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A8 Cross-runtime harness

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

A8 Cross-runtime harness

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

A8 Cross-runtime harness

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

A8 Cross-runtime harness

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

A8 Cross-runtime harness

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.

Distribution

Type: npm-package
License: Apache-2.0
Install: one-liner
Version: 0.6.2

Surfaces

CLI binary: hankweave
CLI subcmds: 4
Local UI: terminal-tui
Tech stack: basic-tui.ts (Bun/TypeScript) — development only

Components

Commands: 0
Skills: 0
Subagents: 0
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 0
Templates: 2

Workflow

Phases: 7
Approval gates: 0
Spec format: json
Spec storage: per-feature-folder
Delta or full: whole-file

Orchestration

Multi-agent: No
Pattern: sequential
Max concurrent: 1
Isolation: git-branch
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text

Execution

Mode: background-daemon
Crash recovery: Yes
Compaction: No
Session handoff: Yes
Streaming: Yes

Memory

Type: file-based
Persistence: project
Search: none
State files: 4 files

Quality

TDD: No
TDD mechanism: none
Self-review: none

Git / Observability

Auto commit: Yes
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: Yes
Audit format: jsonl
Replay: Yes

Tools

Primary: Claude Code
Targets: 5
Portability: medium

Signals

Stars: 123
Last commit: 2026-03-20
Quality score: 5.5/10

Summary

hankweave — Summary

Overview

hankweave — Overview

Origin

Core Philosophy

Opinionated Choices

Brownfield Engineering

Unusual Terminology

Architecture

hankweave — Architecture

Distribution

Repository Layout

Config Files

Supported Harnesses

Components

hankweave — Components

Core Primitives

Hank (the program)

Codon (agentic block)

Rig (deterministic setup)

Sentinel (real-time monitor)

Budget

CLI Subcommands

Event Journal

Checkpointing

No Skills, No Hooks, No Slash Commands

Prompts

hankweave — Prompts

Excerpt 1: hank.schema.json — Codon definition (verbatim excerpt)

Excerpt 2: hank-basics.md — Codon philosophy (verbatim)

Prompting Architecture

Uniqueness

hankweave — Uniqueness

Differs from Seeds

Distinctive Position

Explicit Anti-Patterns

Observable Failure Modes

Inspired By

Workflow

hankweave — Workflow

Development → Deployment Lifecycle

CCEPL-Driven Development

Execution Phases

Loop / Retry Pattern

Approval Gates

Memory Context

hankweave — Memory and Context

State Between Codons

Git Checkpointing

Event Journal

Data (read-only mount)

Context Boundaries

Budget State

Orchestration

hankweave — Orchestration

Multi-Agent Pattern

Orchestration Pattern

Harness Multiplexing

Sentinel Parallelism

Execution Mode

Isolation Mechanism

No Human-in-the-Loop

Ui Cli Surface

hankweave — UI/CLI Surface

CLI Binary: hankweave

Terminal Dashboard (Development Only)

WebSocket Event Stream

IDE Integration

Clausetta: Auto-Generated Shims

Cross-Tool Portability

Related frameworks

CLI Binary: `hankweave`