Skip to content
/

Hawkeye

hawkeye · MLaminekane/hawkeye · ★ 5 · last commit 2026-04-14

Primitive shape 31 total
Skills 1 Hooks 3 MCP tools 27
00

Summary

Hawkeye — Summary

Hawkeye is an observability and control layer for AI agents — a CLI tool + web dashboard + Claude Code hooks that records every agent action (commands, file operations, LLM calls, network activity, cost), detects objective drift in real time, enforces guardrails (file protection, command blocking, cost limits), enables session replay and post-mortem analysis, and supports multi-agent swarm orchestration.

Problem it solves: Once an AI agent starts working, there is no way to see what it's actually doing, why it's spending tokens, whether it's still on task, or what happened during a bad run. Hawkeye provides "the flight recorder for AI agents" — inspection tools that work during and after a session.

Distinctive trait: Hawkeye operates at multiple layers simultaneously: Claude Code hooks (PreToolUse, PostToolUse, Stop) for hook-based recording, a hawkeye record wrapper for non-hook agents, and an MCP server that exposes 27 tools for agent self-awareness. A React + Vite dashboard at hawkeye serve provides session timeline, drift scoring, run comparison, replay, root-cause analysis, memory diff, and live agent spawning (swarm).

Target audience: Developers with complex multi-agent workflows, overnight runs, or expensive sessions who want visibility beyond "one CLI prompt in one terminal" — and need answer tools for "why did this fail?" and "what did it do?"

Production-readiness: Early but active (5 stars, MIT, TypeScript monorepo, v0.3.1). Latest changes 2026-03-30.

Differs from seeds: Most similar to claude-flow in multi-agent orchestration scope but Hawkeye's primary value is observability rather than execution. Unlike any seed framework, Hawkeye stores all session data in SQLite (~/.hawkeye/traces.db), supports session comparison, drift scoring, and post-mortem analysis. The MCP server with 27 agent-facing tools (self-awareness API) is unique in the corpus.

01

Overview

Hawkeye — Overview

Origin

Created by MLaminekane. TypeScript monorepo with packages: core, cli, dashboard. npm package hawkeye-ai (v0.3.1). MIT license.

Philosophy

From the README:

"Most agent tooling helps you launch an agent. Hawkeye helps you answer what happened after it started drifting, overspending, touching the wrong files, or failing in a way nobody can explain."

"If an agent touched your repo, spent money, or got weird, you should be able to inspect it like a real system."

"Hawkeye is especially useful once your workflow stops being 'one CLI prompt in one terminal' and becomes 'multiple agents, long-running sessions, real cost, real risk.'"

Core Value Props (from README)

  1. Record exactly what an agent did across terminal, files, network, and LLM calls
  2. Replay a bad run instead of guessing
  3. Detect objective drift before a session goes off the rails
  4. Enforce guardrails around files, commands, directories, cost, and review gates
  5. Compare runs, inspect memory, and generate useful post-mortems
  6. Monitor live tasks, spawned agents, and multi-agent work in one place

Architecture (from CLAUDE.md)

packages/core/    Node/TypeScript SDK: recorder, interceptors, drift engine,
                  SQLite storage, RCA, memory diff, swarm types/config
packages/cli/     Main product runtime: CLI commands, daemon, hooks, MCP server,
                  serve, desktop automation, reports
packages/dashboard/ React + Vite dashboard served by hawkeye serve

Event Flow

1. Interceptors or hooks capture actions
2. Guardrails evaluate synchronously for risky actions
3. Events persisted to SQLite in .hawkeye/traces.db
4. Drift snapshots and analyses computed on top
02

Architecture

Hawkeye — Architecture

Distribution

  • Type: npm package (monorepo)
  • Package: hawkeye-ai on npm (v0.3.1)
  • License: MIT
  • Language: TypeScript

Install Methods

npm install -g hawkeye-ai
npx hawkeye-ai
brew install MLaminekane/hawkeye/hawkeye-ai

Required Runtime

  • Node.js >= 20
  • Git (for git operation tracking)
  • Claude Code (for hooks integration)

Directory Structure (after init)

project/
└── .hawkeye/
    └── traces.db          # SQLite database: all session data

~/.hawkeye/               # Global config + traces (across projects)

Package Architecture (Monorepo)

hawkeye/
├── packages/
│   ├── core/              # recorder, interceptors, drift engine, SQLite, RCA, memory diff, swarm
│   │   ├── src/types.ts   # Shared types
│   │   └── src/storage/sqlite.ts  # SQLite storage
│   ├── cli/               # CLI commands, daemon, serve, hooks, MCP server, automation
│   │   ├── src/commands/serve.ts   # Dashboard API server + websockets
│   │   ├── src/commands/daemon.ts  # Remote task runner
│   │   └── src/interactive.ts      # TUI entrypoint (slash commands)
│   └── dashboard/         # React + Vite web dashboard
│       ├── src/pages/TasksPage.tsx
│       ├── src/pages/SwarmPage.tsx
│       └── src/pages/SessionDetailPage.tsx
├── skills/
│   └── hawkeye/
│       └── SKILL.md        # Claude Code skill for agent self-awareness
└── .claude/
    └── settings.json       # PreToolUse + PostToolUse + Stop hooks

Claude Code Hook Events

Event Handler Matcher
PreToolUse hawkeye hook-handler --event PreToolUse (all tools)
PostToolUse hawkeye hook-handler --event PostToolUse (all tools)
Stop hawkeye hook-handler --event Stop (none)

Target AI Tools

  • Claude Code (primary — via hooks)
  • Codex (via hawkeye record wrapper)
  • Cline (via hawkeye record wrapper)
  • Any CLI agent (via hawkeye record -- <command>)

MCP Server (27 tools)

Exposed via hawkeye mcp — provides agent self-awareness tools (check drift, check cost, log decisions, etc.).

03

Components

Hawkeye — Components

CLI Binary

Binary Source
hawkeye packages/cli/dist/index.js

CLI Commands

Command Purpose
hawkeye init Initialize Hawkeye in project
hawkeye hooks install Install Claude Code hooks
hawkeye hooks install --guardrails-only Install only guardrail hooks
hawkeye hooks status Check installed hooks
hawkeye hooks uninstall Remove hooks
hawkeye record -o "<objective>" -- <command> Wrap any CLI agent with recording
hawkeye sessions List recorded sessions
hawkeye serve Launch web dashboard at default port
hawkeye (interactive) Launch TUI

Hooks (3 events)

Event Matcher Purpose
PreToolUse (all) Capture tool intent, evaluate guardrails synchronously
PostToolUse (all) Record tool outcome, update session timeline
Stop (none) Finalize session, compute drift score, save summary

Skills (1)

Name File Purpose
hawkeye skills/hawkeye/SKILL.md Claude Code skill for Hawkeye integration: setup, recording, MCP config, guardrail management

MCP Server (27 tools)

Exposed via hawkeye mcp for agent self-awareness:

  • Check current drift score
  • Check session cost
  • Log decisions to session timeline
  • Query past session data
  • Set/check guardrails
  • (22 additional tools not individually enumerated in README)

Web Dashboard Pages

Page Purpose
Sessions List + filter all recorded sessions
Session Detail Timeline view, replay, file changes, cost breakdown
Compare Side-by-side comparison of multiple sessions
Firewall Guardrail management (file protection, command blocking)
Tasks Remote task daemon prompt submission
Agents (Swarm) Live spawned agents, follow-up, relaunch, cost tracking
Memory Memory diff between sessions
Analyze Root-cause analysis for bad runs

Storage

Component Path Purpose
SQLite DB .hawkeye/traces.db All session data: commands, file ops, LLM calls, cost, timing
Drift snapshots In traces.db Computed drift scores per session checkpoint

Guardrail Categories

  • File protection (block writes to specified files/directories)
  • Command blocking (block specific shell commands)
  • Directory scope (restrict agent to specified directories)
  • Cost limits (stop/warn at token/cost threshold)
  • Token limits
  • Network lock (restrict network access)
  • Review gates (require human approval before proceeding)
05

Prompts

Hawkeye — Prompts

Prompt 1: Drift Detection (packages/core/src/drift/prompts.ts)

Technique: LLM-as-judge with JSON-only output constraint. Injected with concrete scoring examples to calibrate the 0-100 scale. Three-level flag system (ok/warning/critical).

You are an AI agent drift detection system. Your job is to evaluate whether an AI coding agent is staying on-task or drifting away from the user's objective.

ORIGINAL USER OBJECTIVE:
"${objective}"

RECENT AGENT ACTIONS (most recent last):
${actionsFormatted}

INSTRUCTIONS:
1. Identify what the agent has been doing in the last few actions (be specific)
2. Compare this to the stated objective
3. If the agent is doing something unrelated, explain WHAT it's doing instead and WHY that's a problem

Respond ONLY in JSON:
{
  "score": <number 0-100>,
  "flag": "ok" | "warning" | "critical",
  "reason": "<specific explanation: name exactly what the agent is doing and whether it relates to the objective>",
  "suggestion": "<concrete corrective action, or null>"
}

SCORING GUIDE:
- 85-100 "ok": Agent is actively working on the objective
- 70-84 "ok": Agent is doing preparatory or cleanup work related to the objective
- 50-69 "warning": Agent is doing something tangentially related or spending too long on setup
- 30-49 "warning": Agent is clearly working on something different
- 0-29 "critical": Agent is doing unrelated, repetitive, or potentially dangerous actions

Inline examples in prompt (few-shot calibration):

  • CSS edits during auth objective → score 25, critical
  • Reading + modifying login files for login bug → score 95, ok
  • Stripe install looping on same TypeScript error → score 45, warning

Prompt 2: Post-Mortem Analysis (packages/core/src/llm/post-mortem.ts)

Technique: Structured session analyst role with full context injection (metadata, event summary, drift history, violations, errors). Forces structured JSON output with typed outcome field.

You are an AI session analyst. Generate a structured post-mortem report for a completed AI agent coding session.

SESSION METADATA:
- Objective: "${objective}"
- Agent: ${agent}
- Status: ${status}
- Started: ${startedAt}
- Duration: ${durationMinutes} minutes
- Total actions: ${totalActions}
- Total cost: $${totalCostUsd}
- Final drift score: ${finalDriftScore}/100

EVENT SUMMARY (by type):
${eventSummary}

FILES MODIFIED:
${filesSummary}

DRIFT HISTORY:
${driftHistory}

GUARDRAIL VIOLATIONS:
${violations}

ERRORS:
${errors}

Output schema:

{
  summary: string;
  outcome: 'success' | 'partial' | 'failure' | 'abandoned';
  keyActions: string[];
  issues: string[];
  driftAnalysis: string;
  costAssessment: string;
  recommendations: string[];
}

Prompt 3: Skill Instructions (skills/hawkeye/SKILL.md)

Technique: Activation-condition list for the hosting agent. Tells Claude Code when to invoke Hawkeye vs handle natively. Includes troubleshooting runbook and output-format guide for presenting Hawkeye data to users.

Key trigger conditions:

  • User asks to monitor/record/observe an agent session
  • User wants cost tracking or token budget limits
  • User wants to protect files or block dangerous commands
  • User wants to replay or inspect what an agent did
  • User asks about drift detection
  • User wants a post-mortem or session summary
  • User wants to run an agent overnight with safety guardrails

Output format rule (embedded in skill):

"When presenting Hawkeye data to users: Always show the drift score with its status (ok/warning/critical). Always show the total cost and top files by cost. Highlight any guardrail violations or blocked actions. If drift is in warning/critical range, suggest the user review the agent's recent actions."

Notes

  • Drift prompt is optional — heuristic mode (scoreHeuristic) is the default and requires no LLM.
  • Post-mortem prompt requires a configured LLM provider in .hawkeye/config.json.
  • No system-prompt injection at session start — Hawkeye does not modify the Claude Code system prompt.
  • All LLM prompts are in packages/core/src/ (not in the CLI hooks), keeping the CLI hooks lightweight.
09

Uniqueness

Hawkeye — Uniqueness

Primary Differentiator: Session Replay + Drift Detection as First-Class Features

Most governance/observability tools either block (guardrails) or record (audit). Hawkeye combines both with replay and drift detection in a single tool. You can replay any past session event-by-event, compute whether the agent drifted from its objective, and generate LLM post-mortems — all from the same CLI binary.

Specific Differentiators

1. Real-Token Cost Reading from Claude Code JSONL

Hawkeye reads cost from Claude Code's own session .jsonl files at ~/.claude/projects/<encoded-cwd>/<sessionId>.jsonl, not from token estimates. It tracks byte offsets to read incrementally per hook invocation, deduplicates by message ID (streaming sends same ID multiple times), and produces accurate per-action cost attribution. No other tool in this batch does this.

2. Heuristic-First Drift Scoring (LLM Optional)

scoreHeuristic() runs entirely locally with no LLM call. The LLM drift prompt is opt-in. This means drift scoring works out-of-box with zero API key configuration, and the LLM mode only activates when more nuanced reasoning is needed.

3. 27-Tool MCP Server for Agent Self-Awareness

Hawkeye exposes 27 MCP tools that give the running agent real-time awareness of its own session state. Agents can call check_drift, check_cost, check_guardrail, self_assess, auto_correct — effectively turning Hawkeye into a governance co-pilot that the agent queries proactively. This bidirectional channel (hooks block/allow inbound; MCP answers queries outbound) is unique in this batch.

4. Memory Diff Engine

Cross-session memory tracking with 7 categories (file_knowledge, error_lesson, correction, tool_pattern, decision, dependency_fact, api_knowledge) and hallucination detection (nonexistent_file, phantom_api, contradicted_fact, recurring_error). No other tool in this batch tracks what an agent "remembers" or "forgets" between sessions.

5. Autonomous Control Layer / Autocorrect

Beyond blocking, Hawkeye can autonomously correct agent behavior: rollback files, block command patterns, inject correction hints. The auto_correct MCP tool returns both recommendations AND active corrections already applied, with agentInstructions the agent must follow. This is reactive governance vs purely preventive.

6. Swarm Mode with Git Worktree Isolation

Multi-agent coordination using git worktree for actual filesystem isolation (not just prompt-level scope injection). Each agent gets its own branch, conflict detection runs before merge, and merge order is topologically determined.

7. Overnight Mode with Morning Report

A single command (hawkeye overnight --budget <$N>) that configures guardrails, starts a background daemon, serves the dashboard, and generates a structured morning report on shutdown. Designed for unattended long-running agent runs.

Compared to Batch Peers

Framework Enforcement Layer Drift Detection Session Replay MCP Self-Awareness
Hawkeye Claude Code hooks (PreToolUse/PostToolUse/Stop) Yes (heuristic + LLM) Yes 27 tools
TDD Guard PreToolUse hook No No No
Claude Code Guardrails PreToolUse + PostToolUse + Stop hooks No No No
AGT (Microsoft) SessionStart + UserPromptSubmit + PreToolUse No No 2 tools
Leash (StrongDM) OS/eBPF kernel layer No No No
CC Audit Static analysis (no hooks) No No No

What Hawkeye Does Not Do

  • No test-first enforcement (unlike TDD Guard)
  • No static CLAUDE.md/hook quality linting (unlike AgentLint)
  • No kernel-level enforcement (unlike Leash)
  • No 12-vector prompt injection defense (unlike AGT)
  • No Cedar policy language (unlike Leash)
  • Scope injection in Swarm mode is prompt-based, not OS-enforced
04

Workflow

Hawkeye — Workflow

Session Lifecycle

hawkeye init
  → Create .hawkeye/ directory
  → Write .hawkeye/config.json (default config)
  → Create .hawkeye/traces.db (SQLite)

hawkeye hooks install
  → Write Claude Code hooks to settings.json
  → PreToolUse: hawkeye hook-handler --event PreToolUse
  → PostToolUse: hawkeye hook-handler --event PostToolUse
  → Stop: hawkeye hook-handler --event Stop

Claude Code Hook Flow (Primary Path)

Claude Code fires PreToolUse
  → hook-handler reads JSON from stdin
  → Evaluate guardrails:
      checkFileProtection(toolName, toolInput, config)
      checkDangerousCommand(toolName, toolInput, config)
      checkDirectoryScope(toolName, toolInput, config)
      checkNetworkLock(toolName, toolInput, config)
      checkReviewGate(toolName, toolInput, config)
  → Violation found → write JSON to stdout (exit 2 = block)
  → No violation → allow (exit 0)
  → Record PreToolUse event in traces.db

Claude Code fires PostToolUse
  → hook-handler reads JSON from stdin
  → Record completed action with full data:
      - toolName, toolInput, toolOutput
      - Bash output capture
      - File content snapshots (before/after for diffs)
      - LLM cost estimation via JSONL reader
      - Git branch/commit context
  → Compute incremental drift score (heuristic or LLM)
  → Save TraceEvent to traces.db

Claude Code fires Stop
  → NOTE: Stop fires after EVERY response, not just session end
  → Do NOT close session — update drift snapshot only
  → Persist current drift score to traces.db
  → Session ends manually via "hawkeye end" or dashboard timeout

Record Mode (Non-Claude Agents)

hawkeye record -o "objective" -- <command>
  → Spawn child process
  → Inject Node.js preload script (--require hook)
  → Patch global fetch + http/https.request
  → Intercept LLM API calls (Anthropic, OpenAI, Deepseek, Mistral, Google)
      - Detect provider from URL
      - Parse SSE streaming and JSON responses
      - Extract input/output tokens → estimate cost
  → Forward to NetworkLock check → block or allow
  → Record llm_call events to traces.db
  → On process exit → finalize session

LLM Cost Reading (Claude Code Specific)

PostToolUse hook
  → Read ~/.claude/projects/<encoded-cwd>/<sessionId>.jsonl
  → Parse JSONL from last-read offset (incremental)
  → Extract usage.input_tokens, usage.output_tokens, model
  → Deduplicate by message ID (streaming sends same ID multiple times)
  → Compute cost via cost table (sonnet/opus/haiku per 1M tokens)
  → Attach to TraceEvent

Drift Detection

Per PostToolUse:
  scoreHeuristic(session, recentEvents, objective)
    → Keyword overlap, action relevance, error frequency
    → Returns 0-100 score

  slidingDriftScore(history)
    → Weighted average over recent N events
    → More recent events weighted higher

  Optional LLM drift:
    → Call configured provider with objective + recent events
    → LLM rates alignment 0-100
    → Cached, not run on every event

  Thresholds:
    ok: 70-100
    warning: 40-69
    critical: 0-39

Guardrail Block Flow

PreToolUse violation detected:
  → Write guardrail block response to stdout:
      { "decision": "block", "reason": "<message>" }
  → Exit 2 (Claude Code interprets as block)
  → Record guardrail_block event in traces.db

Review gate (require_approval):
  → Write file-based approval request
  → Poll for human approval before proceeding
  → On timeout → deny

Overnight Mode

hawkeye overnight --budget <$N>
  → Apply cost limits, file protection, command blocking
  → Start hawkeye serve (web dashboard)
  → Start hawkeye daemon (background recorder)
  → Auto-pause on critical drift
  → Backup .hawkeye/config.json
  → On Ctrl+C:
      → Generate morning report (per-session stats, drift, errors)
      → Optionally: LLM post-mortem
      → Fire webhooks: overnight_report, session_complete, task_complete
      → Restore original config

Phase-to-Artifact Map

Phase Artifact
hawkeye init .hawkeye/config.json, .hawkeye/traces.db
PreToolUse TraceEvent (guardrail_block or pre-action record)
PostToolUse TraceEvent (command/file_write/llm_call/etc.) with cost+drift
Stop Drift snapshot update in traces.db
hawkeye end Session row marked completed, final drift saved
hawkeye report Markdown/text report from session data
post_mortem MCP tool LLM-generated structured post-mortem
06

Memory Context

Hawkeye — Memory & Context

Storage Backend

All memory is stored in SQLite at .hawkeye/traces.db. No external memory provider, no vector store.

.hawkeye/
  traces.db          — All session data: events, drift, cost, guardrail violations
  config.json        — Hawkeye config (guardrails, drift provider, API keys, webhooks)

Session-Level Context

Each session record stores:

  • id — UUID
  • objective — user-provided at hawkeye record -o
  • agent — auto-detected or provided via --agent
  • model — LLM model name
  • status — recording | paused | completed | aborted
  • started_at, ended_at
  • final_drift_score — 0-100

Event Log

The primary memory artifact. Every action is a TraceEvent in traces.db:

Event Type Description
command Shell command executed
file_read File opened for reading
file_write File modified
file_delete File deleted
file_rename File renamed/moved
api_call External HTTP request
llm_call LLM API call with token/cost data
decision Agent-logged decision (via MCP log_event)
error Error encountered
git_commit Git commit during session
git_checkout / git_push / git_pull / git_merge Git operations
guardrail_trigger Guardrail warning fired
guardrail_block Action blocked by guardrail
drift_alert Drift threshold crossed

Drift Snapshots

Stored in traces.db per session checkpoint. The Stop hook updates the snapshot after every Claude Code response. Fields:

  • session_id
  • score (0-100)
  • flag (ok / warning / critical)
  • reason (LLM explanation, if LLM drift enabled)
  • suggestion
  • timestamp

Memory Diff Engine (packages/core/src/analysis/memory-diff.ts)

Cross-session comparison. Extracts structured "memories" from event logs and diffs between sessions:

Memory categories:

  • file_knowledge — files the agent touched/understood
  • error_lesson — errors encountered and their fixes
  • correction — agent self-corrections
  • tool_pattern — recurring tool usage patterns
  • decision — explicit agent decisions (via log_event)
  • dependency_fact — package/version facts
  • api_knowledge — API behavior facts

Diff output types:

  • learned — knowledge gained in session B not in A
  • forgotten — knowledge present in A but absent in B
  • retained — knowledge stable across both sessions
  • evolved — knowledge that changed between sessions
  • contradicted — conflicting claims across sessions
  • hallucinations — recurring false beliefs detected (nonexistent_file, phantom_api, contradicted_fact, recurring_error)

Cumulative Memory

buildCumulativeMemory(sessionMemories) builds a cross-session knowledge base from up to 20 past sessions. Used by MCP tool check_memory to give the agent context about what was learned in prior runs.

Context Window Management

Hawkeye does not manage Claude Code's context window directly. However:

  1. JSONL reader — hook-handler reads Claude Code's .jsonl session file incrementally (byte offset tracking) to avoid re-parsing the full context on every hook invocation.

  2. Drift prompt — passes only recent N events (not full history) to the LLM for drift analysis. Exact N is configurable.

  3. Memory diff — caps at 20 sessions for cumulative memory to bound computation.

  4. No context injection — Hawkeye does not inject summaries or memories back into Claude Code's context. Agent self-awareness is mediated entirely through MCP tool calls (check_memory, get_objective, check_drift).

07

Orchestration

Hawkeye — Orchestration

Orchestration Model

Hawkeye is primarily a single-session observability tool, not an orchestrator. However it includes a multi-agent Swarm mode and an Arena mode for competitive agent evaluation.

Swarm Mode (hawkeye swarm)

Coordinates multiple AI agents in parallel. Each agent runs in an isolated git worktree.

hawkeye swarm --config swarm.json
  → Validate SwarmConfig (agent definitions, dependencies, scope)
  → resolveDependencies(agents) → topological sort
  → For each agent (respecting dependency order):
      → createWorktree(cwd, worktreePath, branch)
      → Build full prompt: task + scope note + context
      → buildAgentInvocation(persona.command, fullPrompt)
      → Spawn agent process in worktree
      → Record SwarmAgent row in traces.db

  → On agent completion:
      → detectConflicts(worktreePaths)
      → suggestMergeOrder(results)
      → mergeAgent(cwd, branch, agentName)
        → "git merge <branch> --no-ff -m 'swarm: merge <name>'"
      → removeWorktree(cwd, path, branch)
      → Fire webhooks

  → Live TUI overlay: per-agent status, cost, duration, color

SwarmConfig fields:

  • agents — array of AgentPersona (name, command, task, scope, color, args, timeout)
  • dependencies — agent dependency graph (A depends on B)

Scope enforcement:

  • Injected into agent prompt: "IMPORTANT: You are ONLY allowed to modify files matching: <include>. Do NOT modify: <exclude>."
  • Not enforced at the kernel/OS level — relies on prompt injection

Conflict detection:

  • detectConflicts(worktreePaths) — checks git diff overlap across worktrees before merge
  • suggestMergeOrder(results) — topological merge order recommendation

Arena Mode (hawkeye arena)

Runs two or more agents on the same task independently (competitive evaluation). Compare results afterward via hawkeye compare. Primarily for benchmarking/evaluation; not a production orchestration pattern.

Daemon Mode (hawkeye daemon)

Background process that monitors active sessions. Used in overnight mode:

  • Polls for active sessions
  • Applies auto-pause on critical drift
  • Fires webhooks on threshold events
  • No spawning of new agents — pure monitoring

MCP Self-Awareness (Agent→Hawkeye Protocol)

Running agents call back to Hawkeye via 27 MCP tools. This is the primary bidirectional communication channel:

Direction Mechanism
Hawkeye → Agent Hook stdout JSON ({ "decision": "block", "reason": "..." })
Agent → Hawkeye MCP tool call (check_drift, check_guardrail, log_event, etc.)

Agents are expected to call check_drift every ~10 actions and self_assess when something feels off. Hawkeye provides corrections via auto_correct / get_correction MCP tools (Autonomous Control Layer).

Webhook Integration

Hawkeye fires webhooks at session events. Configured in .hawkeye/config.json:

Event Trigger
session_complete Session ended
task_complete Task within session marked done
overnight_report Overnight mode shutdown
guardrail_block Action blocked by guardrail
drift_alert Drift threshold crossed

No Planner / No LLM Routing

Hawkeye does not include a planner, task decomposer, or LLM-based router. Task decomposition is delegated to the wrapped agent. Hawkeye's role is observe, record, guard, and report — not plan.

08

Ui Cli Surface

Hawkeye — UI & CLI Surface

CLI Binary

Binary Source Install
hawkeye packages/cli/dist/index.js npm install -g hawkeye-ai

CLI Commands (Full List)

Command Purpose
hawkeye init Initialize Hawkeye in project (creates .hawkeye/)
hawkeye hooks install Install Claude Code hooks
hawkeye hooks install --guardrails-only Install only guardrail hooks
hawkeye hooks status Check installed hooks
hawkeye hooks uninstall Remove hooks
hawkeye record -o "<objective>" -- <cmd> Wrap any CLI agent with recording
hawkeye sessions List recorded sessions
hawkeye inspect <id> Detailed session inspection
hawkeye watch <id> Live event stream (tail -f style)
hawkeye compare <id1> <id2> Side-by-side session comparison
hawkeye replay <id> Interactive session replay
hawkeye end End active session
hawkeye revert <id> Revert file changes from session
hawkeye export <id> Export session as JSON
hawkeye analyze <id> Root cause analysis
hawkeye memory diff <A> <B> Memory diff between sessions
hawkeye memory hallucinations Recurring false beliefs across sessions
hawkeye memory Cumulative cross-session memory
hawkeye swarm --config swarm.json Multi-agent swarm run
hawkeye swarm init Generate swarm config template
hawkeye policy init Initialize security policy
hawkeye policy check Check policy
hawkeye report Generate morning report
hawkeye report --llm --webhook With LLM post-mortem + webhook notification
hawkeye serve Launch web dashboard at :4242
hawkeye daemon Start background monitoring daemon
hawkeye overnight --budget <$N> Overnight unattended mode
hawkeye remote serve + daemon + Cloudflare tunnel
hawkeye mcp Show MCP server setup instructions
hawkeye ci Post session report to GitHub PR
hawkeye approve Approve pending review gate actions
hawkeye (no args) Launch interactive TUI

Interactive TUI

Invoked by running hawkeye with no arguments. Readline-based interface with slash command autocomplete.

37 slash commands available (from interactive-constants.ts):

  • /new — New session (pick agent + objective)
  • /attach — Launch agent on active session
  • /sessions — List & manage sessions
  • /active — Current recording
  • /watch — Live event stream
  • /stats — Session or global statistics
  • /inspect — Detailed session inspection
  • /compare — Compare sessions side by side
  • /replay — Replay a session (interactive)
  • /firewall — Show recent interceptions & impact previews
  • /analyze — Root cause analysis
  • /memory — Memory diff
  • /autocorrect — Autonomous control
  • /swarm — Multi-agent orchestration
  • /settings — Configure Hawkeye
  • /overnight — Overnight mode
  • /serve — Open dashboard :4242
  • /ci — Post to GitHub PR
  • ... (19 additional commands)

Auto-record: If you type a known agent command (claude, cline, codex), TUI automatically wraps it with recording.

Web Dashboard

URL Page Purpose
:4242 Sessions List + filter all sessions
:4242/session/<id> Session Detail Timeline, replay, file changes, cost breakdown
:4242/compare Compare Side-by-side multi-session comparison
:4242/firewall Firewall Guardrail management (file protection, command blocking)
:4242/tasks Tasks Remote task submission
:4242/agents Agents (Swarm) Live spawned agents, follow-up, relaunch, cost tracking
:4242/memory Memory Memory diff between sessions
:4242/analyze Analyze Root cause analysis for failed runs

Stack: React + Vite (TypeScript) in packages/dashboard/. Served by hawkeye serve via Node.js HTTP server + WebSocket for live updates.

SKILL.md (Claude Code Skill)

Located at skills/hawkeye/SKILL.md. Activates via skill system when Claude Code detects Hawkeye-relevant user requests. Contains:

  • Activation trigger list (when to use Hawkeye vs native)
  • Setup instructions for hook vs MCP mode
  • All core commands with examples
  • Troubleshooting runbook
  • Output format guidance

OpenTelemetry Export

hawkeye otel-export — Exports session events as OpenTelemetry traces. Enables integration with external observability platforms (Jaeger, Grafana, etc.).

CI Integration

hawkeye ci — Posts session report as GitHub Check Run + PR comment. Session cost, drift, guardrail violations surfaced directly in pull requests.

Related frameworks

same archetype · same primary tool · same memory type

Taskmaster AI ★ 27k

Converts a PRD into a dependency-ordered JSON task graph that AI coding agents execute one task at a time, eliminating context…

ccmemory ★ 1

Accumulates decisions, corrections, and failed approaches from Claude Code sessions into a queryable Neo4j graph so each new…

Pimzino spec-workflow-mcp ★ 4.2k

MCP server providing spec-driven development workflow with dashboard-backed approval gates, implementation logging, and VSCode…

MCP Shrimp Task Manager ★ 2.1k

Convert natural language requests into structured AI development tasks with chain-of-thought enforcement, reflection gates, and…

Bernstein ★ 460

Govern parallel CLI coding agents with a deterministic Python scheduler, HMAC-chained audit trail, and compliance-ready signed…

LeanSpec ★ 252

Provides a unified spec CLI and MCP server over any existing spec backend (markdown, GitHub Issues, ADO), making spec-driven…