Hawkeye

hawkeye · MLaminekane/hawkeye · ★ 5 · last commit 2026-04-14

Primitive shape 31 total

Skills 1 Hooks 3 MCP tools 27

Summary

Hawkeye — Summary

Hawkeye is an observability and control layer for AI agents — a CLI tool + web dashboard + Claude Code hooks that records every agent action (commands, file operations, LLM calls, network activity, cost), detects objective drift in real time, enforces guardrails (file protection, command blocking, cost limits), enables session replay and post-mortem analysis, and supports multi-agent swarm orchestration.

Problem it solves: Once an AI agent starts working, there is no way to see what it's actually doing, why it's spending tokens, whether it's still on task, or what happened during a bad run. Hawkeye provides "the flight recorder for AI agents" — inspection tools that work during and after a session.

Distinctive trait: Hawkeye operates at multiple layers simultaneously: Claude Code hooks (PreToolUse, PostToolUse, Stop) for hook-based recording, a hawkeye record wrapper for non-hook agents, and an MCP server that exposes 27 tools for agent self-awareness. A React + Vite dashboard at hawkeye serve provides session timeline, drift scoring, run comparison, replay, root-cause analysis, memory diff, and live agent spawning (swarm).

Target audience: Developers with complex multi-agent workflows, overnight runs, or expensive sessions who want visibility beyond "one CLI prompt in one terminal" — and need answer tools for "why did this fail?" and "what did it do?"

Production-readiness: Early but active (5 stars, MIT, TypeScript monorepo, v0.3.1). Latest changes 2026-03-30.

Differs from seeds: Most similar to claude-flow in multi-agent orchestration scope but Hawkeye's primary value is observability rather than execution. Unlike any seed framework, Hawkeye stores all session data in SQLite (~/.hawkeye/traces.db), supports session comparison, drift scoring, and post-mortem analysis. The MCP server with 27 agent-facing tools (self-awareness API) is unique in the corpus.

Overview

Hawkeye — Overview

Origin

Created by MLaminekane. TypeScript monorepo with packages: core, cli, dashboard. npm package hawkeye-ai (v0.3.1). MIT license.

Philosophy

From the README:

"Most agent tooling helps you launch an agent. Hawkeye helps you answer what happened after it started drifting, overspending, touching the wrong files, or failing in a way nobody can explain."

"If an agent touched your repo, spent money, or got weird, you should be able to inspect it like a real system."

"Hawkeye is especially useful once your workflow stops being 'one CLI prompt in one terminal' and becomes 'multiple agents, long-running sessions, real cost, real risk.'"

Core Value Props (from README)

Record exactly what an agent did across terminal, files, network, and LLM calls
Replay a bad run instead of guessing
Detect objective drift before a session goes off the rails
Enforce guardrails around files, commands, directories, cost, and review gates
Compare runs, inspect memory, and generate useful post-mortems
Monitor live tasks, spawned agents, and multi-agent work in one place

Architecture (from CLAUDE.md)

packages/core/    Node/TypeScript SDK: recorder, interceptors, drift engine,
                  SQLite storage, RCA, memory diff, swarm types/config
packages/cli/     Main product runtime: CLI commands, daemon, hooks, MCP server,
                  serve, desktop automation, reports
packages/dashboard/ React + Vite dashboard served by hawkeye serve

Event Flow

1. Interceptors or hooks capture actions
2. Guardrails evaluate synchronously for risky actions
3. Events persisted to SQLite in .hawkeye/traces.db
4. Drift snapshots and analyses computed on top

Architecture

Hawkeye — Architecture

Distribution

Type: npm package (monorepo)
Package: hawkeye-ai on npm (v0.3.1)
License: MIT
Language: TypeScript

Install Methods

npm install -g hawkeye-ai
npx hawkeye-ai
brew install MLaminekane/hawkeye/hawkeye-ai

Required Runtime

Node.js >= 20
Git (for git operation tracking)
Claude Code (for hooks integration)

Directory Structure (after init)

project/
└── .hawkeye/
    └── traces.db          # SQLite database: all session data

~/.hawkeye/               # Global config + traces (across projects)

Package Architecture (Monorepo)

hawkeye/
├── packages/
│   ├── core/              # recorder, interceptors, drift engine, SQLite, RCA, memory diff, swarm
│   │   ├── src/types.ts   # Shared types
│   │   └── src/storage/sqlite.ts  # SQLite storage
│   ├── cli/               # CLI commands, daemon, serve, hooks, MCP server, automation
│   │   ├── src/commands/serve.ts   # Dashboard API server + websockets
│   │   ├── src/commands/daemon.ts  # Remote task runner
│   │   └── src/interactive.ts      # TUI entrypoint (slash commands)
│   └── dashboard/         # React + Vite web dashboard
│       ├── src/pages/TasksPage.tsx
│       ├── src/pages/SwarmPage.tsx
│       └── src/pages/SessionDetailPage.tsx
├── skills/
│   └── hawkeye/
│       └── SKILL.md        # Claude Code skill for agent self-awareness
└── .claude/
    └── settings.json       # PreToolUse + PostToolUse + Stop hooks

Claude Code Hook Events

Event	Handler	Matcher
`PreToolUse`	`hawkeye hook-handler --event PreToolUse`	(all tools)
`PostToolUse`	`hawkeye hook-handler --event PostToolUse`	(all tools)
`Stop`	`hawkeye hook-handler --event Stop`	(none)

Target AI Tools

Claude Code (primary — via hooks)
Codex (via hawkeye record wrapper)
Cline (via hawkeye record wrapper)
Any CLI agent (via hawkeye record -- <command>)

MCP Server (27 tools)

Exposed via hawkeye mcp — provides agent self-awareness tools (check drift, check cost, log decisions, etc.).

Components

Hawkeye — Components

CLI Binary

Binary	Source
`hawkeye`	`packages/cli/dist/index.js`

CLI Commands

Command	Purpose
`hawkeye init`	Initialize Hawkeye in project
`hawkeye hooks install`	Install Claude Code hooks
`hawkeye hooks install --guardrails-only`	Install only guardrail hooks
`hawkeye hooks status`	Check installed hooks
`hawkeye hooks uninstall`	Remove hooks
`hawkeye record -o "<objective>" -- <command>`	Wrap any CLI agent with recording
`hawkeye sessions`	List recorded sessions
`hawkeye serve`	Launch web dashboard at default port
`hawkeye` (interactive)	Launch TUI

Hooks (3 events)

Event	Matcher	Purpose
`PreToolUse`	(all)	Capture tool intent, evaluate guardrails synchronously
`PostToolUse`	(all)	Record tool outcome, update session timeline
`Stop`	(none)	Finalize session, compute drift score, save summary

Skills (1)

Name	File	Purpose
`hawkeye`	`skills/hawkeye/SKILL.md`	Claude Code skill for Hawkeye integration: setup, recording, MCP config, guardrail management

MCP Server (27 tools)

Exposed via hawkeye mcp for agent self-awareness:

Check current drift score
Check session cost
Log decisions to session timeline
Query past session data
Set/check guardrails
(22 additional tools not individually enumerated in README)

Web Dashboard Pages

Page	Purpose
Sessions	List + filter all recorded sessions
Session Detail	Timeline view, replay, file changes, cost breakdown
Compare	Side-by-side comparison of multiple sessions
Firewall	Guardrail management (file protection, command blocking)
Tasks	Remote task daemon prompt submission
Agents (Swarm)	Live spawned agents, follow-up, relaunch, cost tracking
Memory	Memory diff between sessions
Analyze	Root-cause analysis for bad runs

Storage

Component	Path	Purpose
SQLite DB	`.hawkeye/traces.db`	All session data: commands, file ops, LLM calls, cost, timing
Drift snapshots	In traces.db	Computed drift scores per session checkpoint

Guardrail Categories

File protection (block writes to specified files/directories)
Command blocking (block specific shell commands)
Directory scope (restrict agent to specified directories)
Cost limits (stop/warn at token/cost threshold)
Token limits
Network lock (restrict network access)
Review gates (require human approval before proceeding)

Prompts

Hawkeye — Prompts

Prompt 1: Drift Detection (`packages/core/src/drift/prompts.ts`)

Technique: LLM-as-judge with JSON-only output constraint. Injected with concrete scoring examples to calibrate the 0-100 scale. Three-level flag system (ok/warning/critical).

You are an AI agent drift detection system. Your job is to evaluate whether an AI coding agent is staying on-task or drifting away from the user's objective.

ORIGINAL USER OBJECTIVE:
"${objective}"

RECENT AGENT ACTIONS (most recent last):
${actionsFormatted}

INSTRUCTIONS:
1. Identify what the agent has been doing in the last few actions (be specific)
2. Compare this to the stated objective
3. If the agent is doing something unrelated, explain WHAT it's doing instead and WHY that's a problem

Respond ONLY in JSON:
{
  "score": <number 0-100>,
  "flag": "ok" | "warning" | "critical",
  "reason": "<specific explanation: name exactly what the agent is doing and whether it relates to the objective>",
  "suggestion": "<concrete corrective action, or null>"
}

SCORING GUIDE:
- 85-100 "ok": Agent is actively working on the objective
- 70-84 "ok": Agent is doing preparatory or cleanup work related to the objective
- 50-69 "warning": Agent is doing something tangentially related or spending too long on setup
- 30-49 "warning": Agent is clearly working on something different
- 0-29 "critical": Agent is doing unrelated, repetitive, or potentially dangerous actions

Inline examples in prompt (few-shot calibration):

CSS edits during auth objective → score 25, critical
Reading + modifying login files for login bug → score 95, ok
Stripe install looping on same TypeScript error → score 45, warning

Prompt 2: Post-Mortem Analysis (`packages/core/src/llm/post-mortem.ts`)

Technique: Structured session analyst role with full context injection (metadata, event summary, drift history, violations, errors). Forces structured JSON output with typed outcome field.

You are an AI session analyst. Generate a structured post-mortem report for a completed AI agent coding session.

SESSION METADATA:
- Objective: "${objective}"
- Agent: ${agent}
- Status: ${status}
- Started: ${startedAt}
- Duration: ${durationMinutes} minutes
- Total actions: ${totalActions}
- Total cost: $${totalCostUsd}
- Final drift score: ${finalDriftScore}/100

EVENT SUMMARY (by type):
${eventSummary}

FILES MODIFIED:
${filesSummary}

DRIFT HISTORY:
${driftHistory}

GUARDRAIL VIOLATIONS:
${violations}

ERRORS:
${errors}

Output schema:

{
  summary: string;
  outcome: 'success' | 'partial' | 'failure' | 'abandoned';
  keyActions: string[];
  issues: string[];
  driftAnalysis: string;
  costAssessment: string;
  recommendations: string[];
}

Prompt 3: Skill Instructions (`skills/hawkeye/SKILL.md`)

Technique: Activation-condition list for the hosting agent. Tells Claude Code when to invoke Hawkeye vs handle natively. Includes troubleshooting runbook and output-format guide for presenting Hawkeye data to users.

Key trigger conditions:

User asks to monitor/record/observe an agent session
User wants cost tracking or token budget limits
User wants to protect files or block dangerous commands
User wants to replay or inspect what an agent did
User asks about drift detection
User wants a post-mortem or session summary
User wants to run an agent overnight with safety guardrails

Output format rule (embedded in skill):

"When presenting Hawkeye data to users: Always show the drift score with its status (ok/warning/critical). Always show the total cost and top files by cost. Highlight any guardrail violations or blocked actions. If drift is in warning/critical range, suggest the user review the agent's recent actions."

Notes

Drift prompt is optional — heuristic mode (scoreHeuristic) is the default and requires no LLM.
Post-mortem prompt requires a configured LLM provider in .hawkeye/config.json.
No system-prompt injection at session start — Hawkeye does not modify the Claude Code system prompt.
All LLM prompts are in packages/core/src/ (not in the CLI hooks), keeping the CLI hooks lightweight.

Uniqueness

Hawkeye — Uniqueness

Primary Differentiator: Session Replay + Drift Detection as First-Class Features

Most governance/observability tools either block (guardrails) or record (audit). Hawkeye combines both with replay and drift detection in a single tool. You can replay any past session event-by-event, compute whether the agent drifted from its objective, and generate LLM post-mortems — all from the same CLI binary.

Specific Differentiators

1. Real-Token Cost Reading from Claude Code JSONL

Hawkeye reads cost from Claude Code's own session .jsonl files at ~/.claude/projects/<encoded-cwd>/<sessionId>.jsonl, not from token estimates. It tracks byte offsets to read incrementally per hook invocation, deduplicates by message ID (streaming sends same ID multiple times), and produces accurate per-action cost attribution. No other tool in this batch does this.

2. Heuristic-First Drift Scoring (LLM Optional)

scoreHeuristic() runs entirely locally with no LLM call. The LLM drift prompt is opt-in. This means drift scoring works out-of-box with zero API key configuration, and the LLM mode only activates when more nuanced reasoning is needed.

3. 27-Tool MCP Server for Agent Self-Awareness

Hawkeye exposes 27 MCP tools that give the running agent real-time awareness of its own session state. Agents can call check_drift, check_cost, check_guardrail, self_assess, auto_correct — effectively turning Hawkeye into a governance co-pilot that the agent queries proactively. This bidirectional channel (hooks block/allow inbound; MCP answers queries outbound) is unique in this batch.

4. Memory Diff Engine

Cross-session memory tracking with 7 categories (file_knowledge, error_lesson, correction, tool_pattern, decision, dependency_fact, api_knowledge) and hallucination detection (nonexistent_file, phantom_api, contradicted_fact, recurring_error). No other tool in this batch tracks what an agent "remembers" or "forgets" between sessions.

5. Autonomous Control Layer / Autocorrect

Beyond blocking, Hawkeye can autonomously correct agent behavior: rollback files, block command patterns, inject correction hints. The auto_correct MCP tool returns both recommendations AND active corrections already applied, with agentInstructions the agent must follow. This is reactive governance vs purely preventive.

6. Swarm Mode with Git Worktree Isolation

Multi-agent coordination using git worktree for actual filesystem isolation (not just prompt-level scope injection). Each agent gets its own branch, conflict detection runs before merge, and merge order is topologically determined.

7. Overnight Mode with Morning Report

A single command (hawkeye overnight --budget <$N>) that configures guardrails, starts a background daemon, serves the dashboard, and generates a structured morning report on shutdown. Designed for unattended long-running agent runs.

Compared to Batch Peers

Framework	Enforcement Layer	Drift Detection	Session Replay	MCP Self-Awareness
Hawkeye	Claude Code hooks (PreToolUse/PostToolUse/Stop)	Yes (heuristic + LLM)	Yes	27 tools
TDD Guard	PreToolUse hook	No	No	No
Claude Code Guardrails	PreToolUse + PostToolUse + Stop hooks	No	No	No
AGT (Microsoft)	SessionStart + UserPromptSubmit + PreToolUse	No	No	2 tools
Leash (StrongDM)	OS/eBPF kernel layer	No	No	No
CC Audit	Static analysis (no hooks)	No	No	No

What Hawkeye Does Not Do

No test-first enforcement (unlike TDD Guard)
No static CLAUDE.md/hook quality linting (unlike AgentLint)
No kernel-level enforcement (unlike Leash)
No 12-vector prompt injection defense (unlike AGT)
No Cedar policy language (unlike Leash)
Scope injection in Swarm mode is prompt-based, not OS-enforced

Workflow

Hawkeye — Workflow

Session Lifecycle

hawkeye init
  → Create .hawkeye/ directory
  → Write .hawkeye/config.json (default config)
  → Create .hawkeye/traces.db (SQLite)

hawkeye hooks install
  → Write Claude Code hooks to settings.json
  → PreToolUse: hawkeye hook-handler --event PreToolUse
  → PostToolUse: hawkeye hook-handler --event PostToolUse
  → Stop: hawkeye hook-handler --event Stop

Claude Code Hook Flow (Primary Path)

Claude Code fires PreToolUse
  → hook-handler reads JSON from stdin
  → Evaluate guardrails:
      checkFileProtection(toolName, toolInput, config)
      checkDangerousCommand(toolName, toolInput, config)
      checkDirectoryScope(toolName, toolInput, config)
      checkNetworkLock(toolName, toolInput, config)
      checkReviewGate(toolName, toolInput, config)
  → Violation found → write JSON to stdout (exit 2 = block)
  → No violation → allow (exit 0)
  → Record PreToolUse event in traces.db

Claude Code fires PostToolUse
  → hook-handler reads JSON from stdin
  → Record completed action with full data:
      - toolName, toolInput, toolOutput
      - Bash output capture
      - File content snapshots (before/after for diffs)
      - LLM cost estimation via JSONL reader
      - Git branch/commit context
  → Compute incremental drift score (heuristic or LLM)
  → Save TraceEvent to traces.db

Claude Code fires Stop
  → NOTE: Stop fires after EVERY response, not just session end
  → Do NOT close session — update drift snapshot only
  → Persist current drift score to traces.db
  → Session ends manually via "hawkeye end" or dashboard timeout

Record Mode (Non-Claude Agents)

hawkeye record -o "objective" -- <command>
  → Spawn child process
  → Inject Node.js preload script (--require hook)
  → Patch global fetch + http/https.request
  → Intercept LLM API calls (Anthropic, OpenAI, Deepseek, Mistral, Google)
      - Detect provider from URL
      - Parse SSE streaming and JSON responses
      - Extract input/output tokens → estimate cost
  → Forward to NetworkLock check → block or allow
  → Record llm_call events to traces.db
  → On process exit → finalize session

LLM Cost Reading (Claude Code Specific)

PostToolUse hook
  → Read ~/.claude/projects/<encoded-cwd>/<sessionId>.jsonl
  → Parse JSONL from last-read offset (incremental)
  → Extract usage.input_tokens, usage.output_tokens, model
  → Deduplicate by message ID (streaming sends same ID multiple times)
  → Compute cost via cost table (sonnet/opus/haiku per 1M tokens)
  → Attach to TraceEvent

Drift Detection

Per PostToolUse:
  scoreHeuristic(session, recentEvents, objective)
    → Keyword overlap, action relevance, error frequency
    → Returns 0-100 score

  slidingDriftScore(history)
    → Weighted average over recent N events
    → More recent events weighted higher

  Optional LLM drift:
    → Call configured provider with objective + recent events
    → LLM rates alignment 0-100
    → Cached, not run on every event

  Thresholds:
    ok: 70-100
    warning: 40-69
    critical: 0-39

Guardrail Block Flow

PreToolUse violation detected:
  → Write guardrail block response to stdout:
      { "decision": "block", "reason": "<message>" }
  → Exit 2 (Claude Code interprets as block)
  → Record guardrail_block event in traces.db

Review gate (require_approval):
  → Write file-based approval request
  → Poll for human approval before proceeding
  → On timeout → deny

Overnight Mode

hawkeye overnight --budget <$N>
  → Apply cost limits, file protection, command blocking
  → Start hawkeye serve (web dashboard)
  → Start hawkeye daemon (background recorder)
  → Auto-pause on critical drift
  → Backup .hawkeye/config.json
  → On Ctrl+C:
      → Generate morning report (per-session stats, drift, errors)
      → Optionally: LLM post-mortem
      → Fire webhooks: overnight_report, session_complete, task_complete
      → Restore original config

Phase-to-Artifact Map

Phase	Artifact
hawkeye init	`.hawkeye/config.json`, `.hawkeye/traces.db`
PreToolUse	TraceEvent (guardrail_block or pre-action record)
PostToolUse	TraceEvent (command/file_write/llm_call/etc.) with cost+drift
Stop	Drift snapshot update in traces.db
hawkeye end	Session row marked completed, final drift saved
hawkeye report	Markdown/text report from session data
post_mortem MCP tool	LLM-generated structured post-mortem

Memory Context

Hawkeye — Memory & Context

Storage Backend

All memory is stored in SQLite at .hawkeye/traces.db. No external memory provider, no vector store.

.hawkeye/
  traces.db          — All session data: events, drift, cost, guardrail violations
  config.json        — Hawkeye config (guardrails, drift provider, API keys, webhooks)

Session-Level Context

Each session record stores:

id — UUID
objective — user-provided at hawkeye record -o
agent — auto-detected or provided via --agent
model — LLM model name
status — recording | paused | completed | aborted
started_at, ended_at
final_drift_score — 0-100

Event Log

The primary memory artifact. Every action is a TraceEvent in traces.db:

Event Type	Description
`command`	Shell command executed
`file_read`	File opened for reading
`file_write`	File modified
`file_delete`	File deleted
`file_rename`	File renamed/moved
`api_call`	External HTTP request
`llm_call`	LLM API call with token/cost data
`decision`	Agent-logged decision (via MCP `log_event`)
`error`	Error encountered
`git_commit`	Git commit during session
`git_checkout` / `git_push` / `git_pull` / `git_merge`	Git operations
`guardrail_trigger`	Guardrail warning fired
`guardrail_block`	Action blocked by guardrail
`drift_alert`	Drift threshold crossed

Drift Snapshots

Stored in traces.db per session checkpoint. The Stop hook updates the snapshot after every Claude Code response. Fields:

session_id
score (0-100)
flag (ok / warning / critical)
reason (LLM explanation, if LLM drift enabled)
suggestion
timestamp

Memory Diff Engine (`packages/core/src/analysis/memory-diff.ts`)

Cross-session comparison. Extracts structured "memories" from event logs and diffs between sessions:

Memory categories:

file_knowledge — files the agent touched/understood
error_lesson — errors encountered and their fixes
correction — agent self-corrections
tool_pattern — recurring tool usage patterns
decision — explicit agent decisions (via log_event)
dependency_fact — package/version facts
api_knowledge — API behavior facts

Diff output types:

learned — knowledge gained in session B not in A
forgotten — knowledge present in A but absent in B
retained — knowledge stable across both sessions
evolved — knowledge that changed between sessions
contradicted — conflicting claims across sessions
hallucinations — recurring false beliefs detected (nonexistent_file, phantom_api, contradicted_fact, recurring_error)

Cumulative Memory

buildCumulativeMemory(sessionMemories) builds a cross-session knowledge base from up to 20 past sessions. Used by MCP tool check_memory to give the agent context about what was learned in prior runs.

Context Window Management

Hawkeye does not manage Claude Code's context window directly. However:

JSONL reader — hook-handler reads Claude Code's .jsonl session file incrementally (byte offset tracking) to avoid re-parsing the full context on every hook invocation.
Drift prompt — passes only recent N events (not full history) to the LLM for drift analysis. Exact N is configurable.
Memory diff — caps at 20 sessions for cumulative memory to bound computation.
No context injection — Hawkeye does not inject summaries or memories back into Claude Code's context. Agent self-awareness is mediated entirely through MCP tool calls (check_memory, get_objective, check_drift).

Orchestration

Hawkeye — Orchestration

Orchestration Model

Hawkeye is primarily a single-session observability tool, not an orchestrator. However it includes a multi-agent Swarm mode and an Arena mode for competitive agent evaluation.

Swarm Mode (`hawkeye swarm`)

Coordinates multiple AI agents in parallel. Each agent runs in an isolated git worktree.

hawkeye swarm --config swarm.json
  → Validate SwarmConfig (agent definitions, dependencies, scope)
  → resolveDependencies(agents) → topological sort
  → For each agent (respecting dependency order):
      → createWorktree(cwd, worktreePath, branch)
      → Build full prompt: task + scope note + context
      → buildAgentInvocation(persona.command, fullPrompt)
      → Spawn agent process in worktree
      → Record SwarmAgent row in traces.db

  → On agent completion:
      → detectConflicts(worktreePaths)
      → suggestMergeOrder(results)
      → mergeAgent(cwd, branch, agentName)
        → "git merge <branch> --no-ff -m 'swarm: merge <name>'"
      → removeWorktree(cwd, path, branch)
      → Fire webhooks

  → Live TUI overlay: per-agent status, cost, duration, color

SwarmConfig fields:

agents — array of AgentPersona (name, command, task, scope, color, args, timeout)
dependencies — agent dependency graph (A depends on B)

Scope enforcement:

Injected into agent prompt: "IMPORTANT: You are ONLY allowed to modify files matching: <include>. Do NOT modify: <exclude>."
Not enforced at the kernel/OS level — relies on prompt injection

Conflict detection:

detectConflicts(worktreePaths) — checks git diff overlap across worktrees before merge
suggestMergeOrder(results) — topological merge order recommendation

Arena Mode (`hawkeye arena`)

Runs two or more agents on the same task independently (competitive evaluation). Compare results afterward via hawkeye compare. Primarily for benchmarking/evaluation; not a production orchestration pattern.

Daemon Mode (`hawkeye daemon`)

Background process that monitors active sessions. Used in overnight mode:

Polls for active sessions
Applies auto-pause on critical drift
Fires webhooks on threshold events
No spawning of new agents — pure monitoring

MCP Self-Awareness (Agent→Hawkeye Protocol)

Running agents call back to Hawkeye via 27 MCP tools. This is the primary bidirectional communication channel:

Direction	Mechanism
Hawkeye → Agent	Hook stdout JSON (`{ "decision": "block", "reason": "..." }`)
Agent → Hawkeye	MCP tool call (`check_drift`, `check_guardrail`, `log_event`, etc.)

Agents are expected to call check_drift every ~10 actions and self_assess when something feels off. Hawkeye provides corrections via auto_correct / get_correction MCP tools (Autonomous Control Layer).

Webhook Integration

Hawkeye fires webhooks at session events. Configured in .hawkeye/config.json:

Event	Trigger
`session_complete`	Session ended
`task_complete`	Task within session marked done
`overnight_report`	Overnight mode shutdown
`guardrail_block`	Action blocked by guardrail
`drift_alert`	Drift threshold crossed

No Planner / No LLM Routing

Hawkeye does not include a planner, task decomposer, or LLM-based router. Task decomposition is delegated to the wrapped agent. Hawkeye's role is observe, record, guard, and report — not plan.

Ui Cli Surface

Hawkeye — UI & CLI Surface

CLI Binary

Binary	Source	Install
`hawkeye`	`packages/cli/dist/index.js`	`npm install -g hawkeye-ai`

CLI Commands (Full List)

Command	Purpose
`hawkeye init`	Initialize Hawkeye in project (creates .hawkeye/)
`hawkeye hooks install`	Install Claude Code hooks
`hawkeye hooks install --guardrails-only`	Install only guardrail hooks
`hawkeye hooks status`	Check installed hooks
`hawkeye hooks uninstall`	Remove hooks
`hawkeye record -o "<objective>" -- <cmd>`	Wrap any CLI agent with recording
`hawkeye sessions`	List recorded sessions
`hawkeye inspect <id>`	Detailed session inspection
`hawkeye watch <id>`	Live event stream (tail -f style)
`hawkeye compare <id1> <id2>`	Side-by-side session comparison
`hawkeye replay <id>`	Interactive session replay
`hawkeye end`	End active session
`hawkeye revert <id>`	Revert file changes from session
`hawkeye export <id>`	Export session as JSON
`hawkeye analyze <id>`	Root cause analysis
`hawkeye memory diff <A> <B>`	Memory diff between sessions
`hawkeye memory hallucinations`	Recurring false beliefs across sessions
`hawkeye memory`	Cumulative cross-session memory
`hawkeye swarm --config swarm.json`	Multi-agent swarm run
`hawkeye swarm init`	Generate swarm config template
`hawkeye policy init`	Initialize security policy
`hawkeye policy check`	Check policy
`hawkeye report`	Generate morning report
`hawkeye report --llm --webhook`	With LLM post-mortem + webhook notification
`hawkeye serve`	Launch web dashboard at :4242
`hawkeye daemon`	Start background monitoring daemon
`hawkeye overnight --budget <$N>`	Overnight unattended mode
`hawkeye remote`	serve + daemon + Cloudflare tunnel
`hawkeye mcp`	Show MCP server setup instructions
`hawkeye ci`	Post session report to GitHub PR
`hawkeye approve`	Approve pending review gate actions
`hawkeye` (no args)	Launch interactive TUI

Interactive TUI

Invoked by running hawkeye with no arguments. Readline-based interface with slash command autocomplete.

37 slash commands available (from interactive-constants.ts):

/new — New session (pick agent + objective)
/attach — Launch agent on active session
/sessions — List & manage sessions
/active — Current recording
/watch — Live event stream
/stats — Session or global statistics
/inspect — Detailed session inspection
/compare — Compare sessions side by side
/replay — Replay a session (interactive)
/firewall — Show recent interceptions & impact previews
/analyze — Root cause analysis
/memory — Memory diff
/autocorrect — Autonomous control
/swarm — Multi-agent orchestration
/settings — Configure Hawkeye
/overnight — Overnight mode
/serve — Open dashboard :4242
/ci — Post to GitHub PR
... (19 additional commands)

Auto-record: If you type a known agent command (claude, cline, codex), TUI automatically wraps it with recording.

Web Dashboard

URL	Page	Purpose
`:4242`	Sessions	List + filter all sessions
`:4242/session/<id>`	Session Detail	Timeline, replay, file changes, cost breakdown
`:4242/compare`	Compare	Side-by-side multi-session comparison
`:4242/firewall`	Firewall	Guardrail management (file protection, command blocking)
`:4242/tasks`	Tasks	Remote task submission
`:4242/agents`	Agents (Swarm)	Live spawned agents, follow-up, relaunch, cost tracking
`:4242/memory`	Memory	Memory diff between sessions
`:4242/analyze`	Analyze	Root cause analysis for failed runs

Stack: React + Vite (TypeScript) in packages/dashboard/. Served by hawkeye serve via Node.js HTTP server + WebSocket for live updates.

SKILL.md (Claude Code Skill)

Located at skills/hawkeye/SKILL.md. Activates via skill system when Claude Code detects Hawkeye-relevant user requests. Contains:

Activation trigger list (when to use Hawkeye vs native)
Setup instructions for hook vs MCP mode
All core commands with examples
Troubleshooting runbook
Output format guidance

OpenTelemetry Export

hawkeye otel-export — Exports session events as OpenTelemetry traces. Enables integration with external observability platforms (Jaeger, Grafana, etc.).

CI Integration

hawkeye ci — Posts session report as GitHub Check Run + PR comment. Session cost, drift, guardrail violations surfaced directly in pull requests.

Related frameworks

same archetype · same primary tool · same memory type

Taskmaster AI ★ 27k

A3 MCP-anchored

Converts a PRD into a dependency-ordered JSON task graph that AI coding agents execute one task at a time, eliminating context…

ccmemory ★ 1

A3 MCP-anchored

Accumulates decisions, corrections, and failed approaches from Claude Code sessions into a queryable Neo4j graph so each new…

Pimzino spec-workflow-mcp ★ 4.2k

A3 MCP-anchored

MCP server providing spec-driven development workflow with dashboard-backed approval gates, implementation logging, and VSCode…

MCP Shrimp Task Manager ★ 2.1k

A3 MCP-anchored

Convert natural language requests into structured AI development tasks with chain-of-thought enforcement, reflection gates, and…

Bernstein ★ 460

A3 MCP-anchored

Govern parallel CLI coding agents with a deterministic Python scheduler, HMAC-chained audit trail, and compliance-ready signed…

LeanSpec ★ 252

A3 MCP-anchored

Provides a unified spec CLI and MCP server over any existing spec backend (markdown, GitHub Issues, ADO), making spec-driven…

Distribution

Type: npm-package
License: MIT
Install: simple
Version: 0.3.1

Surfaces

CLI binary: hawkeye
CLI subcmds: 35
Local UI: web-dashboard
UI port: 4242
Tech stack: React + Vite (TypeScript) + WebSocket + Node.js HTTP server

Components

Commands: 0
Skills: 1
Subagents: 0
Hooks: 3
MCP servers: 1
MCP tools: 27
Scripts: 0
Templates: 0

Workflow

Phases: 5
Approval gates: 6
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: Yes
Pattern: swarm-git-worktree
Isolation: git-worktree
Consensus: topological-merge-order
Prompt chaining: No

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text

Execution

Mode: event-driven
Crash recovery: No
Compaction: No
Session handoff: Yes
Streaming: Yes

Memory

Type: sqlite
Persistence: project
Search: sql-query
State files: 2 files

Quality

TDD: No
TDD mechanism: none
Self-review: external-llm

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: Yes
Worktree/feat: Yes
Audit log: Yes
Audit format: sqlite
Replay: Yes

Tools

Primary: claude-code
Targets: 6
Portability: high

Signals

Stars: 5
Last commit: 2026-04-14
Contributors: 1
Maintainer: active
Quality score: 5.7/10

Summary

Hawkeye — Summary

Overview

Hawkeye — Overview

Origin

Philosophy

Core Value Props (from README)

Architecture (from CLAUDE.md)

Event Flow

Architecture

Hawkeye — Architecture

Distribution

Install Methods

Required Runtime

Directory Structure (after init)

Package Architecture (Monorepo)

Claude Code Hook Events

Target AI Tools

MCP Server (27 tools)

Components

Hawkeye — Components

CLI Binary

CLI Commands

Hooks (3 events)

Skills (1)

MCP Server (27 tools)

Web Dashboard Pages

Storage

Guardrail Categories

Prompts

Hawkeye — Prompts

Prompt 1: Drift Detection (packages/core/src/drift/prompts.ts)

Prompt 2: Post-Mortem Analysis (packages/core/src/llm/post-mortem.ts)

Prompt 3: Skill Instructions (skills/hawkeye/SKILL.md)

Notes

Uniqueness

Hawkeye — Uniqueness

Primary Differentiator: Session Replay + Drift Detection as First-Class Features

Specific Differentiators

1. Real-Token Cost Reading from Claude Code JSONL

2. Heuristic-First Drift Scoring (LLM Optional)

3. 27-Tool MCP Server for Agent Self-Awareness

4. Memory Diff Engine

5. Autonomous Control Layer / Autocorrect

6. Swarm Mode with Git Worktree Isolation

7. Overnight Mode with Morning Report

Compared to Batch Peers

What Hawkeye Does Not Do

Workflow

Hawkeye — Workflow

Session Lifecycle

Claude Code Hook Flow (Primary Path)

Record Mode (Non-Claude Agents)

LLM Cost Reading (Claude Code Specific)

Drift Detection

Guardrail Block Flow

Overnight Mode

Phase-to-Artifact Map

Memory Context

Hawkeye — Memory & Context

Storage Backend

Session-Level Context

Event Log

Drift Snapshots

Memory Diff Engine (packages/core/src/analysis/memory-diff.ts)

Cumulative Memory

Context Window Management

Orchestration

Hawkeye — Orchestration

Orchestration Model

Swarm Mode (hawkeye swarm)

Arena Mode (hawkeye arena)

Daemon Mode (hawkeye daemon)

MCP Self-Awareness (Agent→Hawkeye Protocol)

Webhook Integration

No Planner / No LLM Routing

Ui Cli Surface

Hawkeye — UI & CLI Surface

CLI Binary

CLI Commands (Full List)

Prompt 1: Drift Detection (`packages/core/src/drift/prompts.ts`)

Prompt 2: Post-Mortem Analysis (`packages/core/src/llm/post-mortem.ts`)

Prompt 3: Skill Instructions (`skills/hawkeye/SKILL.md`)

Memory Diff Engine (`packages/core/src/analysis/memory-diff.ts`)

Swarm Mode (`hawkeye swarm`)

Arena Mode (`hawkeye arena`)

Daemon Mode (`hawkeye daemon`)