Batch 17 — Hooks / TDD / Governance / Observability

Roster (10)

slug	stars	distribution	cli_binary	local_ui	hooks.events	enforcement_layer	tier
tdd-guard	2149	claude-plugin	yes (tdd-guard)	no	PreToolUse, UserPromptSubmit, SessionStart	PreToolUse LLM-validator (exit 2)	A
vibesec	21	methodology-repo	no	no	none	static analysis (pre-write rule files)	A
agentlint	14	npm-package	yes (4 binaries)	no	SessionStart	static linter (CLAUDE.md + hooks quality)	A
cc-audit	12	npm-package	no	no	none	static analysis (PostToolUse output scan)	A
claude-code-guardrails	11	bash-bundle	no	no	PreToolUse, PostToolUse, Stop, PreCompact	shell scripts (exit 2 block + auto-commit)	A
ai-governor-framework	9 (archived)	methodology-repo	no	no	none	prompt-only (protocols + review command)	A
gaai-framework	35	claude-plugin	no	no	none	skill injection + OS-process isolation	A
leash-strongdm	197	go-binary	yes (leash)	yes (Next.js :18080)	none (OS-layer)	eBPF LSM + Cedar policy (kernel-level)	A
agent-governance-toolkit	892	multi-lang-sdk	yes (agt)	no	SessionStart, UserPromptSubmit, PreToolUse	YAML policy engine + 12-vector prompt defense	A
hawkeye	5	npm-package	yes (hawkeye)	yes (React :4242)	PreToolUse, PostToolUse, Stop	guardrails (exit 2) + full session recording	A

Intra-batch patterns

All 10 frameworks share the same underlying concern — preventing AI agents from doing unsafe or off-task things — but intervene at radically different layers. The enforcement layer is the clearest axis of variation:

Layer 1 — OS/Kernel: Leash intercepts at the eBPF LSM level, below Node.js, below the shell. Policies are Cedar (AWS verification-grade), not YAML. This is the only framework that cannot be bypassed by a compromised agent process.

Layer 2 — Claude Code hooks: TDD Guard (PreToolUse LLM validator), Claude Code Guardrails (shell scripts on Write/Edit + Stop), AGT (SessionStart injection + UserPromptSubmit defense + PreToolUse policy), Hawkeye (PreToolUse guardrails + PostToolUse recording + Stop drift snapshot). These four are mechanically similar in hook structure but diverge sharply in purpose: TDD Guard enforces phase discipline, Guardrails enforces branch safety, AGT enforces injection defense, Hawkeye enforces guardrails while also building a full observability record.

Layer 3 — Static analysis: CC Audit (pattern-matching on session output for risky signals) and AgentLint (linting CLAUDE.md and hook configs for quality issues) operate on artifacts, not live tool calls. Neither can block; both surface findings.

Layer 4 — Methodology/Prompt: AI Governor Framework and VibeSec operate at the convention level. AI Governor is a pure prompt-discipline framework (archived, superseded by GAAI). VibeSec's differentiator is the dual-format build system (definitions/ → .mdc + .md) for cross-IDE rule distribution.

GAAI (Fr-e-d/GAAI-framework) straddles layers: its 61 skills enforce workflow discipline and OS-process isolation for discovery vs delivery roles, but no Claude Code hooks are deployed. It is the most ambitious framework in the batch architecturally (v2.39.0, ELv2 license, backlog-as-contract pattern) but also the most opinionated about workflow as the safety mechanism.

Most interesting finds

TDD Guard's LLM-as-secondary-validator pattern: A PreToolUse hook calls a second LLM (not Claude Code's primary session) to evaluate whether the pending write violates the Red/Green/Refactor cycle. This is the only framework in the entire batch (and likely the entire corpus) where TDD violation = structural exit-2 block rather than a prompt instruction. The secondary LLM receives the full phase context (current tests, implementation, which phase is active) and reasons about TDD compliance before the primary agent can write.
Hawkeye's JSONL cost reader: Hawkeye reads real token costs from Claude Code's own ~/.claude/projects/<encoded-cwd>/<sessionId>.jsonl files by tracking byte offsets and deduplicating by message ID. This produces accurate per-action cost attribution without API key passthrough. No other observability tool in the corpus does this.
Leash's Cedar + eBPF architecture: Cedar is an AWS-developed, formally-verified policy language. Combined with eBPF LSM enforcement, Leash enforces policies at the kernel level — the agent process cannot bypass a Cedar deny any more than a userspace process can bypass a kernel ACL. This is categorically stronger than hook-based enforcement, which operates inside the same Node.js runtime as the agent.
AGT's 12-vector prompt injection defense: The UserPromptSubmit hook runs a 12-vector analysis (unicode homoglyphs, RTL override, Base64 obfuscation, length attacks, context poisoning, MCP tool poisoning, etc.) and fails closed (exit 2) on detection. This is the most comprehensive prompt injection defense at the hook layer in the corpus.
GAAI's Fr-e-d author continuity: The ai-governor-framework README explicitly states it is "superseded by GAAI-framework" and links to the newer repo. This is a rare case of documented framework succession with the same author (Fr-e-d). GAAI represents a full architectural rewrite: from pure prompt protocols to 61 skills + OS-process isolation + ELv2 commercial license.

Items written as Tier C

None. All 10 frameworks had sufficient source code and documentation for full 11-file analysis.

Cross-references discovered

ai-governor-framework → gaai-framework: Same author (Fr-e-d), explicit succession. AI Governor (archived) is the protocol-only predecessor; GAAI is the skill-based successor with OS-process isolation.
tdd-guard → probity: README recommends nizos/probity for new projects as the evolution of TDD Guard.
agentlint → vibesec: AgentLint's 58-check system includes H1-H8 checks specifically for Claude Code hook quality; VibeSec's dual-format rule system generates content that AgentLint would lint.
claude-code-guardrails → cc-audit: Guardrails prevents dangerous writes via PreToolUse blocks; CC Audit scans the session output (PostToolUse) for risky patterns — complementary rather than overlapping.
hawkeye → claude-code-guardrails: Hawkeye's PreToolUse guardrail engine covers the same 7 categories (file protection, command blocking, directory scope, cost limits, token limits, network lock, review gates) but adds PostToolUse recording and Stop drift scoring that Guardrails lacks entirely.
agent-governance-toolkit: Part of a larger Microsoft ecosystem (agent-os, agent-mesh, agent-runtime, agent-sre, agent-compliance, agent-marketplace, agent-lightning, agent-hypervisor) — the Claude Code plugin is the entry-point layer; the Python packages form the full stack.