Skip to content
/
Phase B Batch 17

Batch 17 — Hooks / TDD / Governance / Observability

Batch 17 — Hooks / TDD / Governance / Observability

Roster (10)

slug stars distribution cli_binary local_ui hooks.events enforcement_layer tier
tdd-guard 2149 claude-plugin yes (tdd-guard) no PreToolUse, UserPromptSubmit, SessionStart PreToolUse LLM-validator (exit 2) A
vibesec 21 methodology-repo no no none static analysis (pre-write rule files) A
agentlint 14 npm-package yes (4 binaries) no SessionStart static linter (CLAUDE.md + hooks quality) A
cc-audit 12 npm-package no no none static analysis (PostToolUse output scan) A
claude-code-guardrails 11 bash-bundle no no PreToolUse, PostToolUse, Stop, PreCompact shell scripts (exit 2 block + auto-commit) A
ai-governor-framework 9 (archived) methodology-repo no no none prompt-only (protocols + review command) A
gaai-framework 35 claude-plugin no no none skill injection + OS-process isolation A
leash-strongdm 197 go-binary yes (leash) yes (Next.js :18080) none (OS-layer) eBPF LSM + Cedar policy (kernel-level) A
agent-governance-toolkit 892 multi-lang-sdk yes (agt) no SessionStart, UserPromptSubmit, PreToolUse YAML policy engine + 12-vector prompt defense A
hawkeye 5 npm-package yes (hawkeye) yes (React :4242) PreToolUse, PostToolUse, Stop guardrails (exit 2) + full session recording A

Intra-batch patterns

All 10 frameworks share the same underlying concern — preventing AI agents from doing unsafe or off-task things — but intervene at radically different layers. The enforcement layer is the clearest axis of variation:

Layer 1 — OS/Kernel: Leash intercepts at the eBPF LSM level, below Node.js, below the shell. Policies are Cedar (AWS verification-grade), not YAML. This is the only framework that cannot be bypassed by a compromised agent process.

Layer 2 — Claude Code hooks: TDD Guard (PreToolUse LLM validator), Claude Code Guardrails (shell scripts on Write/Edit + Stop), AGT (SessionStart injection + UserPromptSubmit defense + PreToolUse policy), Hawkeye (PreToolUse guardrails + PostToolUse recording + Stop drift snapshot). These four are mechanically similar in hook structure but diverge sharply in purpose: TDD Guard enforces phase discipline, Guardrails enforces branch safety, AGT enforces injection defense, Hawkeye enforces guardrails while also building a full observability record.

Layer 3 — Static analysis: CC Audit (pattern-matching on session output for risky signals) and AgentLint (linting CLAUDE.md and hook configs for quality issues) operate on artifacts, not live tool calls. Neither can block; both surface findings.

Layer 4 — Methodology/Prompt: AI Governor Framework and VibeSec operate at the convention level. AI Governor is a pure prompt-discipline framework (archived, superseded by GAAI). VibeSec's differentiator is the dual-format build system (definitions/ → .mdc + .md) for cross-IDE rule distribution.

GAAI (Fr-e-d/GAAI-framework) straddles layers: its 61 skills enforce workflow discipline and OS-process isolation for discovery vs delivery roles, but no Claude Code hooks are deployed. It is the most ambitious framework in the batch architecturally (v2.39.0, ELv2 license, backlog-as-contract pattern) but also the most opinionated about workflow as the safety mechanism.

Most interesting finds

  1. TDD Guard's LLM-as-secondary-validator pattern: A PreToolUse hook calls a second LLM (not Claude Code's primary session) to evaluate whether the pending write violates the Red/Green/Refactor cycle. This is the only framework in the entire batch (and likely the entire corpus) where TDD violation = structural exit-2 block rather than a prompt instruction. The secondary LLM receives the full phase context (current tests, implementation, which phase is active) and reasons about TDD compliance before the primary agent can write.

  2. Hawkeye's JSONL cost reader: Hawkeye reads real token costs from Claude Code's own ~/.claude/projects/<encoded-cwd>/<sessionId>.jsonl files by tracking byte offsets and deduplicating by message ID. This produces accurate per-action cost attribution without API key passthrough. No other observability tool in the corpus does this.

  3. Leash's Cedar + eBPF architecture: Cedar is an AWS-developed, formally-verified policy language. Combined with eBPF LSM enforcement, Leash enforces policies at the kernel level — the agent process cannot bypass a Cedar deny any more than a userspace process can bypass a kernel ACL. This is categorically stronger than hook-based enforcement, which operates inside the same Node.js runtime as the agent.

  4. AGT's 12-vector prompt injection defense: The UserPromptSubmit hook runs a 12-vector analysis (unicode homoglyphs, RTL override, Base64 obfuscation, length attacks, context poisoning, MCP tool poisoning, etc.) and fails closed (exit 2) on detection. This is the most comprehensive prompt injection defense at the hook layer in the corpus.

  5. GAAI's Fr-e-d author continuity: The ai-governor-framework README explicitly states it is "superseded by GAAI-framework" and links to the newer repo. This is a rare case of documented framework succession with the same author (Fr-e-d). GAAI represents a full architectural rewrite: from pure prompt protocols to 61 skills + OS-process isolation + ELv2 commercial license.

Items written as Tier C

None. All 10 frameworks had sufficient source code and documentation for full 11-file analysis.

Cross-references discovered

  • ai-governor-framework → gaai-framework: Same author (Fr-e-d), explicit succession. AI Governor (archived) is the protocol-only predecessor; GAAI is the skill-based successor with OS-process isolation.
  • tdd-guard → probity: README recommends nizos/probity for new projects as the evolution of TDD Guard.
  • agentlint → vibesec: AgentLint's 58-check system includes H1-H8 checks specifically for Claude Code hook quality; VibeSec's dual-format rule system generates content that AgentLint would lint.
  • claude-code-guardrails → cc-audit: Guardrails prevents dangerous writes via PreToolUse blocks; CC Audit scans the session output (PostToolUse) for risky patterns — complementary rather than overlapping.
  • hawkeye → claude-code-guardrails: Hawkeye's PreToolUse guardrail engine covers the same 7 categories (file protection, command blocking, directory scope, cost limits, token limits, network lock, review gates) but adds PostToolUse recording and Stop drift scoring that Guardrails lacks entirely.
  • agent-governance-toolkit: Part of a larger Microsoft ecosystem (agent-os, agent-mesh, agent-runtime, agent-sre, agent-compliance, agent-marketplace, agent-lightning, agent-hypervisor) — the Claude Code plugin is the entry-point layer; the Python packages form the full stack.