Skip to content
/
Phase B Batch 32

Batch 32 — Verification, Review Surfaces, Eval Marketplaces + Secondary Isolation

Batch 32 — Verification, Review Surfaces, Eval Marketplaces + Secondary Isolation

Roster (7)

slug stars distribution cli_binary local_ui orchestration multi_model tier
vet-imbue 385 cli-tool (PyPI) vet none none yes (BYOK) A
clearwing 982 standalone-repo clearwing (20 subcmds) web-dashboard + TUI hierarchical yes (4 roles) A
eval-marketplace 22 claude-plugin (deprecated) none none sequential no B
aiignore-cli 8 npm-package aiignore (4 subcmds) none none no (no LLM) A
crit-review 350 cli-tool (Go binary) crit (10 subcmds) web-dashboard (local) sequential no A
sd0x-dev-flow 157 claude-plugin none none hierarchical yes (opus subagent) A
mirrord 5,089 cli-tool + vscode-ext mirrord (5 subcmds) vscode-extension none no A

Intra-batch Patterns

All 7 frameworks address the verification/trust problem at different levels of the stack, but through architecturally distinct mechanisms. Vet and Clearwing both use LLMs to review artifacts (code diffs vs source code for vulnerabilities), but Vet is lightweight and cross-agent while Clearwing is a deep security research platform. Crit and sd0x-dev-flow both enforce human-or-mechanical review gates, but crit does so through a browser UI requiring explicit human action, while sd0x-dev-flow does so through hooks that parse stdout sentinel markers. Three frameworks (vet-imbue, aiignore-cli, crit-review) achieve high cross-tool portability (7–13 agent integrations) by using plain-file protocols instead of agent-specific APIs. mirrord is architecturally orthogonal to all others — it operates at the syscall/OS level rather than the prompt/file level.

The "human-in-loop vs machine-checkable" comparison axis from the batch brief: crit is pure human-in-loop (browser gate, no automation); vet is pure machine-checkable (exit code 10 = issues found, CI-compatible); sd0x-dev-flow sits between them (hooks mechanically enforce that the human-readable review runs, but the review itself is agent-generated). eval-marketplace and clearwing both use LLM analysis to generate human-readable reports that humans then act on.

Most Interesting Finds

  1. sd0x-dev-flow: The sentinel-driven state machine (✅ Ready/⛔ Blocked stdout markers parsed by PostToolUse hooks) is a genuinely novel harness engineering primitive. No other framework in this corpus implements review-gate state as parseable stdout rather than file state or LLM memory. The explicit naming of 10 canonical harness sub-problems with code evidence for each makes this the clearest reference implementation of harness engineering theory encountered in Phase B.

  2. crit-review: The PermissionRequest/ExitPlanMode hook with a 4-day timeout is architecturally significant — it is the only example in this corpus of hooking Claude Code's plan-mode exit as a mandatory human review gate (rather than a Stop gate or PostToolUse gate). The persistent round-to-round diff with drifted: true detection for stale comments is production-quality engineering that other review tools lack.

Items Written as Tier C

None. All 7 frameworks had sufficient public material for full 11-file reports.

Cross-references Discovered

  • vet-imbue installs skills into .claude/, .codex/, .opencode/, .agents/ simultaneously — the same multi-harness install pattern used by aiignore-cli (which generates ignore files for 9 tools simultaneously). Both are "do the research once, apply across all tools" frameworks.
  • crit-review and sd0x-dev-flow both enforce review gates, but through opposite mechanisms: crit requires a human browser click while sd0x-dev-flow requires sentinel stdout markers. Neither references the other.
  • eval-marketplace is deprecated in favor of jeredblu-marketplace — the canonical successor repo was not in this batch but should be flagged for a future analysis.
  • mirrord's Agent Skills (in metalbear-co/skills) follow the same SKILL.md format as superpowers (seed: Archetype 1), confirming that the Agent Skills standard has been adopted beyond the frameworks that invented it.
  • clearwing explicitly positions itself as an open-source reimplementation of Anthropic's internal Glasswing tool — a direct named relationship to a closed-source Anthropic product.