Batch 17 — Hooks / TDD / Governance / Observability
Roster (10)
| slug | stars | distribution | cli_binary | local_ui | hooks.events | enforcement_layer | tier |
|---|---|---|---|---|---|---|---|
| tdd-guard | 2149 | claude-plugin | yes (tdd-guard) | no | PreToolUse, UserPromptSubmit, SessionStart | PreToolUse LLM-validator (exit 2) | A |
| vibesec | 21 | methodology-repo | no | no | none | static analysis (pre-write rule files) | A |
| agentlint | 14 | npm-package | yes (4 binaries) | no | SessionStart | static linter (CLAUDE.md + hooks quality) | A |
| cc-audit | 12 | npm-package | no | no | none | static analysis (PostToolUse output scan) | A |
| claude-code-guardrails | 11 | bash-bundle | no | no | PreToolUse, PostToolUse, Stop, PreCompact | shell scripts (exit 2 block + auto-commit) | A |
| ai-governor-framework | 9 (archived) | methodology-repo | no | no | none | prompt-only (protocols + review command) | A |
| gaai-framework | 35 | claude-plugin | no | no | none | skill injection + OS-process isolation | A |
| leash-strongdm | 197 | go-binary | yes (leash) | yes (Next.js :18080) | none (OS-layer) | eBPF LSM + Cedar policy (kernel-level) | A |
| agent-governance-toolkit | 892 | multi-lang-sdk | yes (agt) | no | SessionStart, UserPromptSubmit, PreToolUse | YAML policy engine + 12-vector prompt defense | A |
| hawkeye | 5 | npm-package | yes (hawkeye) | yes (React :4242) | PreToolUse, PostToolUse, Stop | guardrails (exit 2) + full session recording | A |
Intra-batch patterns
All 10 frameworks share the same underlying concern — preventing AI agents from doing unsafe or off-task things — but intervene at radically different layers. The enforcement layer is the clearest axis of variation:
Layer 1 — OS/Kernel: Leash intercepts at the eBPF LSM level, below Node.js, below the shell. Policies are Cedar (AWS verification-grade), not YAML. This is the only framework that cannot be bypassed by a compromised agent process.
Layer 2 — Claude Code hooks: TDD Guard (PreToolUse LLM validator), Claude Code Guardrails (shell scripts on Write/Edit + Stop), AGT (SessionStart injection + UserPromptSubmit defense + PreToolUse policy), Hawkeye (PreToolUse guardrails + PostToolUse recording + Stop drift snapshot). These four are mechanically similar in hook structure but diverge sharply in purpose: TDD Guard enforces phase discipline, Guardrails enforces branch safety, AGT enforces injection defense, Hawkeye enforces guardrails while also building a full observability record.
Layer 3 — Static analysis: CC Audit (pattern-matching on session output for risky signals) and AgentLint (linting CLAUDE.md and hook configs for quality issues) operate on artifacts, not live tool calls. Neither can block; both surface findings.
Layer 4 — Methodology/Prompt: AI Governor Framework and VibeSec operate at the convention level. AI Governor is a pure prompt-discipline framework (archived, superseded by GAAI). VibeSec's differentiator is the dual-format build system (definitions/ → .mdc + .md) for cross-IDE rule distribution.
GAAI (Fr-e-d/GAAI-framework) straddles layers: its 61 skills enforce workflow discipline and OS-process isolation for discovery vs delivery roles, but no Claude Code hooks are deployed. It is the most ambitious framework in the batch architecturally (v2.39.0, ELv2 license, backlog-as-contract pattern) but also the most opinionated about workflow as the safety mechanism.
Most interesting finds
TDD Guard's LLM-as-secondary-validator pattern: A PreToolUse hook calls a second LLM (not Claude Code's primary session) to evaluate whether the pending write violates the Red/Green/Refactor cycle. This is the only framework in the entire batch (and likely the entire corpus) where TDD violation = structural exit-2 block rather than a prompt instruction. The secondary LLM receives the full phase context (current tests, implementation, which phase is active) and reasons about TDD compliance before the primary agent can write.
Hawkeye's JSONL cost reader: Hawkeye reads real token costs from Claude Code's own
~/.claude/projects/<encoded-cwd>/<sessionId>.jsonlfiles by tracking byte offsets and deduplicating by message ID. This produces accurate per-action cost attribution without API key passthrough. No other observability tool in the corpus does this.Leash's Cedar + eBPF architecture: Cedar is an AWS-developed, formally-verified policy language. Combined with eBPF LSM enforcement, Leash enforces policies at the kernel level — the agent process cannot bypass a Cedar deny any more than a userspace process can bypass a kernel ACL. This is categorically stronger than hook-based enforcement, which operates inside the same Node.js runtime as the agent.
AGT's 12-vector prompt injection defense: The
UserPromptSubmithook runs a 12-vector analysis (unicode homoglyphs, RTL override, Base64 obfuscation, length attacks, context poisoning, MCP tool poisoning, etc.) and fails closed (exit 2) on detection. This is the most comprehensive prompt injection defense at the hook layer in the corpus.GAAI's Fr-e-d author continuity: The
ai-governor-frameworkREADME explicitly states it is "superseded by GAAI-framework" and links to the newer repo. This is a rare case of documented framework succession with the same author (Fr-e-d). GAAI represents a full architectural rewrite: from pure prompt protocols to 61 skills + OS-process isolation + ELv2 commercial license.
Items written as Tier C
None. All 10 frameworks had sufficient source code and documentation for full 11-file analysis.
Cross-references discovered
- ai-governor-framework → gaai-framework: Same author (Fr-e-d), explicit succession. AI Governor (archived) is the protocol-only predecessor; GAAI is the skill-based successor with OS-process isolation.
- tdd-guard → probity: README recommends
nizos/probityfor new projects as the evolution of TDD Guard. - agentlint → vibesec: AgentLint's 58-check system includes H1-H8 checks specifically for Claude Code hook quality; VibeSec's dual-format rule system generates content that AgentLint would lint.
- claude-code-guardrails → cc-audit: Guardrails prevents dangerous writes via PreToolUse blocks; CC Audit scans the session output (PostToolUse) for risky patterns — complementary rather than overlapping.
- hawkeye → claude-code-guardrails: Hawkeye's PreToolUse guardrail engine covers the same 7 categories (file protection, command blocking, directory scope, cost limits, token limits, network lock, review gates) but adds PostToolUse recording and Stop drift scoring that Guardrails lacks entirely.
- agent-governance-toolkit: Part of a larger Microsoft ecosystem (
agent-os,agent-mesh,agent-runtime,agent-sre,agent-compliance,agent-marketplace,agent-lightning,agent-hypervisor) — the Claude Code plugin is the entry-point layer; the Python packages form the full stack.