Batch 25 — Skills/Verification Tools & Code-Audit Utilities
Roster (10)
| slug | stars | distribution | cli_binary | local_ui | orchestration | multi_model | tier |
|---|---|---|---|---|---|---|---|
| sightglass | unknown | unknown | unknown | unknown | unknown | unknown | C (repo 404) |
| heavy3-code-audit | 44 | skill-pack | no | no | parallel-fan-out | yes (GPT/Gemini/Grok via OpenRouter) | A |
| ui-ux-pro-max | 83,052 | claude-plugin | yes (uipro) | no | parallel-fan-out | no | A |
| aurite-agent-verifier | 38 | skill-pack | no | no | sequential | no | A |
| subtask-zippoxer | 330 | cli-tool (Go) | yes (subtask) | yes (TUI) | hierarchical | yes (lead/worker split) | A |
| skill-optimizer | 57 | npm-package | yes (tsx CLI) | no | none | yes (model matrix) | A |
| setup-structure-index | 15 | skill-pack | no | no | none | yes (Haiku for YAML gen) | A |
| unslop | 44 | claude-plugin | yes (python) | no | none | no | A |
| nlpm-xiaolai | 55 | claude-plugin | yes (nlpm-check) | no | hierarchical | yes (haiku/sonnet/opus by task) | A |
| vibe-check-mcp | 486 | mcp-server | yes (npx) | no | none | yes (meta-mentor LLM configurable) | A |
Intra-Batch Patterns
This batch coheres around a single theme: verification and quality assurance for AI-generated artifacts and AI agent behavior. Seven of the nine analyzable tools operate as quality checks or validators — but they target very different layers: heavy3 and aurite-agent-verifier check code/agent output, nlpm-xiaolai checks NL artifact quality, unslop checks prose style, skill-optimizer checks skill reliability via behavioral evals, vibe-check-mcp checks metacognitive alignment, and setup-structure-index maintains structural index accuracy. Only subtask-zippoxer and ui-ux-pro-max are primarily workflow/design tools.
A striking sub-pattern: multi-model consensus as a quality mechanism appears independently in heavy3 (3 LLMs per review), vibe-check-mcp (second LLM as meta-mentor), and skill-optimizer (model matrix evals). All three arrived at the same architectural insight — a single model cannot reliably validate itself.
The batch also reveals a new distribution pattern: self-referential tools that eat their own dog food. Unslop is written following its own rules. Nlpm carries an nlpm-badge.json. Subtask claims to be built using subtask. This self-referential proof pattern is emerging as a credibility signal in the ecosystem.
Most Interesting Finds
nlpm-xiaolai: The only tool in the entire corpus that operates as a meta-linter for the artifact types other frameworks produce. The manifest-vs-disk consistency check (SKILL.md on disk but missing from plugin.json → invisible after install) is a real bug class no other validator covers, verified against Anthropic's own plugin-validator. The self-evolving GitHub Actions pipeline that audits real repos, harvests exemplars, and PRs back improvements to its own rule catalog is architecturally novel.
subtask-zippoxer: The most technically sophisticated tool in the batch — a Go binary with event-sourced state (history.jsonl as append-only truth), git worktree pool management, bidirectional lead-worker communication via
subtask ask/send, a TUI, and self-claimed proof it was built using its own workflow. The "workspace opacity" principle (lead never picks worktrees) and "history wins over SQLite" invariants show careful system design.
Items Written as Tier C
| Slug | Reason |
|---|---|
| sightglass | Repository at https://github.com/sightglass-ai/sightglass returns HTTP 404 — not found, private, or URL incorrect |
Cross-References Discovered
- vibe-check-mcp cites a companion CPI (Chain-Pattern Interrupt) repo at https://github.com/PV-Bhat/cpi — the two are separate but designed to work together
- unslop explicitly positions itself as eating its own dog food (README written following unslop rules)
- nlpm-xiaolai is available via both the xiaolai marketplace AND Anthropic's official community marketplace (with ~24h lag) — first tool in the corpus I've seen on both
- heavy3-code-audit selects Grok 4 for security analysis based on published benchmark scores (Kilo AI exploit test, WMDP-Cyber, CyBench) — model selection rationale is more rigorous than any other tool in the corpus
- ui-ux-pro-max at 83,052 stars stands out sharply from the batch (next highest: 486 for vibe-check-mcp). This star count warrants scrutiny — it may reflect broader nextlevelbuilder community sharing rather than organic tool adoption.