Aurite Agent Verifier

aurite-agent-verifier · aurite-ai/agent-verifier · ★ 38 · last commit 2026-05-26

Primitive shape 5 total

Skills 5

Summary

Aurite Agent Verifier — Summary

Agent Verifier is a coding agent skill that runs automated verification on AI-generated code, specifically targeting agent-specific failure patterns (hallucinated tools, unbounded loops, missing retry limits) alongside security gaps and language-specific quality checks. It is installed once via npx skills add aurite-ai/agent-verifier and thereafter triggered by natural-language commands like "verify agent" or "audit agent."

The skill is structured as a modular orchestrator that dispatches to four focused sub-skills: verify-security, verify-patterns, verify-quality, and verify-language. Checks are classified as either pattern-matched (deterministic, same result every run) or heuristic (judgment-based, may vary). The tool is explicitly not a replacement for linters — it catches what static analysis tools cannot: context-size analysis, tool-registry consistency, and LangGraph cycle detection.

All analysis runs locally; no code leaves the user's machine. The skill supports Claude Code, Cursor, Windsurf, Roo Code, Codex, and 30+ other agents.

Compared to seeds: closest to spec-kit's pre-ship validation gates, but agent-verifier focuses exclusively on correctness-of-AI-agent-code (loops, tool hallucinations, context limits) rather than spec compliance. Unlike superpowers' verification-before-completion (which is a process skill), agent-verifier applies deterministic check patterns derived from framework detection (LangGraph, CrewAI, AutoGen, LangChain).

Overview

Aurite Agent Verifier — Overview

Origin

Built by Aurite AI (aurite.ai), a startup focused on enterprise-grade secure agent infrastructure. The tool was released as open source (MIT) as a community offering; enterprise capabilities (shared context pools, administrative controls, centralized hosting) are available commercially.

Philosophy

AI coding agents are powerful but skip linting, ignore security basics, and hallucinate tool calls. Code reviews catch some of this, but not consistently. Agent Verifier acts as an automated reviewer that specifically checks for patterns no static analyzer can detect: AI agent anti-patterns that only exist because code is generated by an AI operating in an agent loop.

"AI coding agents are powerful — but they skip linting, ignore security basics, and hallucinate tool calls."

"Agent Verifier is an AI agent skill that acts as an automated reviewer."

Two-Tier Reliability Model

The skill introduces a formal distinction between check types:

Pattern-matched checks [P]: Mechanical rule applied to code structure — same answer on every run (high reliability)
Heuristic checks [H]: Requires interpretation of intent or quality — best-effort (may vary)

This explicit reliability tagging lets users understand which findings are authoritative vs. advisory.

Supported Agent Frameworks

The skill auto-detects the agent framework used:

langgraph imports → LangGraph (enables cycle analysis)
crewai imports → CrewAI
autogen imports → AutoGen
langchain imports → LangChain
Direct SDK usage → Custom agent

Explicit Antipatterns Targeted

Hardcoded API keys and secrets
Unbounded retry loops (@retry without explicit stop)
while True / for {} without break in scope
Tool names referenced in prompts but not defined in registry
System prompts exceeding recommended token limits
Unpinned dependencies
TypeScript any types
Go errors silently discarded (_ = fn())
LangGraph cycles with no reachable END node

Architecture

Aurite Agent Verifier — Architecture

Distribution

Type: skill-pack (multi-agent skill with orchestrator + 4 sub-skills)
License: MIT
Install complexity: one-liner via npx skills

Install Commands

# Primary (npx skills CLI)
npx skills add aurite-ai/agent-verifier -a claude-code -a cursor -a <your-agent>

# Then trigger via natural language:
verify agent

Directory Layout

skills/
├── verification/
│   └── SKILL.md          # Orchestrator: full verification suite
├── verify-security/      # Security checks (secrets, deps, input validation)
├── verify-patterns/      # Agent pattern checks (loops, retries, tools, context)
├── verify-quality/       # Code quality (naming, org, documentation)
├── verify-language/      # Language-specific (Python type hints, TS strict, Go errors)
docs/
├── getting-started.md
└── tutorials/
    └── 01-installation.md
tests/
└── fixtures/             # Test directories with known issues for validation
assets/
└── agent_verifier.gif    # Demo animation

Target AI Tools

Claude Code (primary)
Cursor
Windsurf
Roo Code
Codex
30+ other agents

Required Runtime

No runtime dependencies beyond the AI agent (runs locally via agent's existing capabilities — Read, Glob, Grep tools).

Config Files

.kahuna/context-guide.md — optional organizational rules (Kahuna integration)

Components

Aurite Agent Verifier — Components

Skills (Sub-skill Architecture)

Skill	Trigger Phrase	What It Checks
`verification`	"verify agent", "audit agent", "full verification"	Orchestrator — runs all 4 sub-skills and consolidates
`verify-security`	"verify agent security"	Secrets, dependency pinning, input validation, error exposure, secure defaults
`verify-patterns`	"verify agent patterns"	Loop safety, retry limits, tool registry consistency, context size, LangGraph cycles
`verify-quality`	"verify agent quality"	Naming conventions, code organization, documentation
`verify-language`	"verify agent language"	Python type hints, TypeScript strict mode, Go error handling

Pattern-Matched Checks (Deterministic)

Check	Pattern	Outcome
Retry limits	`@retry`/`@backoff`/`p-retry`/`urllib3.Retry` without explicit stop/total	❌ Issue
Loop safety	`while True`/`for {}`/`while (true)` without `break` in scope	❌ Issue
Tool registry	Tool names in prompts absent from definitions	❌ Issue
Context size	`len(prompt)/4` vs token thresholds	⚠️ Warning / ❌ Issue
Requirements pinning	`>=`, `>`, or unpinned deps in requirements.txt/pyproject.toml	❌ Issue
Hardcoded secrets	Assignments to `API_KEY`/`SECRET`/`PASSWORD`/`TOKEN` string literals	❌ Issue
No `any` types (TS)	Unqualified `: any` annotations	⚠️ Warning
Ignored errors (Go)	`_ = functionCall()` where function returns `error`	❌ Issue
LangGraph cycles	Graph cycles with no reachable `END` in edge mappings	❌ Issue

Heuristic Checks (Best-Effort)

Code organization appropriateness
Naming convention consistency
Input validation sufficiency
Docstring quality/usefulness
Tool error handling adequacy

Report Format

# Verification Report
✅ 8 checks passed | ⚠️ 3 warnings | ❌ 2 issues

### By Category
| Category | Pass | Warn | Issue |
| Code Quality | 5 | 1 | 0 |
| Security | 2 | 0 | 1 |
| Agent Patterns | 1 | 2 | 1 |

Prompts

Aurite Agent Verifier — Prompt Excerpts

Excerpt 1: Verification Orchestrator (from skills/verification/SKILL.md)

Technique: Natural-language trigger matching + modular sub-skill dispatch

---
name: verification
version: "1.0.0"
description: Full agent verification suite. Runs security, patterns, quality, and language-specific checks. Use when asked to "verify agent", "verify my agent", "audit agent", or "full verification".
---

## When to Use

Trigger this skill when the user asks to:
- **"verify agent"** (primary invocation)
- "verify my agent"
- "audit agent"
- "full verification"
- "verify my code" (when agent patterns are detected)
- "check compliance"

Analysis: The trigger specification is a concrete list of exact phrases, not abstract descriptions. This reduces false-positive skill activation (other skills' descriptions overlap with "verify code"). The version: "1.0.0" field enables future compatibility tracking.

Excerpt 2: Context Discovery Protocol (from skills/verification/SKILL.md)

Technique: Ordered priority scan with framework-specific branch logic

### Step 1: Context Discovery

Scan the project to identify:

1. **Primary language:**
   - Check for `pyproject.toml`, `package.json`, `go.mod`
   - Look at file extensions in `src/` or project root

2. **Agent framework (if any):**
   - `langgraph` in imports → LangGraph
   - `crewai` in imports → CrewAI
   - `autogen` in imports → AutoGen
   - `langchain` in imports → LangChain
   - Direct SDK usage → Custom agent

3. **Kahuna integration:**
   - Check if `.kahuna/` directory exists
   - If yes, read `.kahuna/context-guide.md` for organizational rules

Record the detected context for reporting.

Analysis: Framework detection is pattern-matching on file system artifacts and import statements, not heuristic. The ordered scan creates a deterministic priority for language and framework identification. Framework detection gates which checks fire (e.g., LangGraph cycles only checked if LangGraph detected).

Excerpt 3: Pattern-Matched vs Heuristic Classification (from README)

Technique: Explicit reliability taxonomy embedded in output

| Tier | Tag | How it's applied | Reliability |
|------|-----|-----------------|-------------|
| Pattern-matched | `[P]` | Mechanical — rule applied exactly as specified to code structure | High — same answer on every run |
| Heuristic | `[H]` | Judgment — requires interpretation of intent or quality | Best-effort — may vary |

**Pattern-matched checks** (reliable):
| Check | What it looks for |
| Retry limits | `@retry`/`@backoff`/`p-retry`/`urllib3.Retry` without explicit stop/total parameter → ❌ Issue |
| Loop safety | `while True`/`for {}`/`while (true)` without `break` in scope → ❌ Issue |
| Tool registry | Tool names referenced in prompts but absent from definitions → ❌ Issue |

Analysis: Explicitly communicates confidence levels to the user. This "reliability contract" prevents over-trust of heuristic findings and validates pattern-matched findings as authoritative. The distinction is built into the output format, not just documentation.

Uniqueness

Aurite Agent Verifier — Uniqueness & Positioning

Differs From Seeds

Closest seed: superpowers (verification-before-completion skill). But superpowers' verification skill is a process guide — it tells the agent how to verify work (run tests, check linting, etc.). Agent Verifier is a content checker — it applies specific rules to detect specific anti-patterns in AI-generated code. Unlike superpowers, agent-verifier introduces a formal reliability taxonomy (pattern-matched [P] vs heuristic [H]) and targets AI-agent-specific failure modes (hallucinated tools, unbounded loops) that no seed framework explicitly covers.

Also distinct from spec-kit (spec compliance validation) and kiro (IDE-level steering hooks). Agent Verifier operates after implementation, not during it, and focuses on agent code specifically rather than general code quality.

Observable Failure Modes

Heuristic variability: [H]-tagged checks may give different results on successive runs — users must know which findings to trust.
No auto-fix: The tool reports but never applies fixes; remediation is entirely manual.
No continuous integration: Must be manually triggered; cannot be wired as a pre-commit hook.
Framework detection may fail: Custom agent patterns not matching known frameworks treated as "Direct SDK usage" — may miss framework-specific issues.
Token context limits: Very large codebases may cause context overflow during Glob/Read phases.

Distinctive Opinion

AI agent code has a distinct failure category not covered by any existing linter: the class of errors that only exist because the code was generated by an AI in an agent loop (hallucinated tool references, context size mismanagement, missing termination in retry/loop patterns). Static analyzers cannot detect these. Dedicated AI-agent verification tooling is a new required layer in the quality stack.

Positioning

Tool	Agent Verifier	ESLint/Biome	Semgrep	Manual Review
AI agent patterns	✅	❌	❌	Sometimes
Security checks	✅	Partial	✅	Sometimes
Context-size analysis	✅	❌	❌	Rarely
Tool hallucination detection	✅	❌	❌	Rarely
Runs inside AI agent	✅	❌	❌	N/A

Workflow

Aurite Agent Verifier — Workflow

Verification Phases

Phase	Action	Artifact
1. Invocation	User says "verify agent" or variant	Trigger
2. Context Discovery	Scan project: detect primary language, agent framework, Kahuna integration	Detection context
3. Security Checks	Run verify-security sub-skill	Security findings
4. Pattern Checks	Run verify-patterns sub-skill	Pattern findings
5. Quality Checks	Run verify-quality sub-skill	Quality findings
6. Language Checks	Run verify-language sub-skill	Language findings
7. Consolidation	Aggregate findings, tag each as [P] or [H]	Unified report
8. Output	Structured report with pass/warn/issue counts by category	Verification Report

Context Discovery Details

Primary language detection: Check pyproject.toml, package.json, go.mod; look at file extensions in src/ or project root
Agent framework detection: Scan imports for langgraph, crewai, autogen, langchain
Organizational rules: Check if .kahuna/ directory exists; if yes, read .kahuna/context-guide.md

Approval Gates

None — the tool produces a report and stops. Remediation is at the user's discretion. No automatic fix application.

Manual Triggering Modes

Command	Scope
"verify agent"	Full suite (all 4 sub-skills)
"verify agent security"	Security only
"verify agent patterns"	Agent patterns only
"verify agent quality"	Quality only
"verify agent language"	Language-specific only
"verify this code for agent patterns"	Targeted on specific directory/files

Fixtures for Testing

tests/fixtures/ contains directories with known issues (e.g. infinite_loop/) for validating the skill's detection accuracy.

Memory Context

Aurite Agent Verifier — Memory & Context

State Storage

No persistent memory. Each verification is a fresh scan of the current project state.

Context Built Per Invocation

Language detection result
Agent framework detection result
Organizational rules from .kahuna/context-guide.md (if present)
File scan results (Read, Glob, Grep tools)

Cross-Session Handoff

None. The tool is stateless between invocations.

Audit Output

The verification report is output to the conversation only — not written to a file unless the user explicitly asks. No VERIFICATION.md or similar artifact is created automatically.

Memory Type

None — the skill relies entirely on the agent's native file-reading capabilities for context. No external state store.

Orchestration

Aurite Agent Verifier — Orchestration

Multi-Agent Pattern

Pattern: hierarchical (orchestrator dispatches to 4 sub-skills sequentially)

The verification skill acts as an orchestrator. It runs context discovery, then serially dispatches to:

verify-security
verify-patterns
verify-quality
verify-language

Each sub-skill is a separate SKILL.md file loaded by the orchestrator. The orchestrator consolidates all findings into a unified report.

Isolation Mechanism

None — the skill reads in-place. No worktrees or containers.

Multi-Model

No. All verification runs in the primary agent (Claude Code, Cursor, etc.). No external model API calls.

Execution Mode

One-shot — triggered by user phrase, runs to completion, outputs report. No daemon, no loop.

Subagent Definition Format

The sub-skills (verify-security, verify-patterns, etc.) are loaded as skills by the orchestrator, but they are not subagents spawned in separate contexts — they are sequential skill-loading instructions within the same agent session.

Crash Recovery

None — stateless tool; no partial state to recover.

Ui Cli Surface

Aurite Agent Verifier — UI / CLI Surface

CLI Binary

No dedicated CLI. Install via npx skills add aurite-ai/agent-verifier. No binary is produced; skills files are written to the agent's skills directory.

UI / Dashboard

None. Output is a structured markdown report in the agent conversation.

IDE Integration

Installed as a skill available to:

Claude Code
Cursor
Windsurf
Roo Code
Codex
30+ other agents supporting the skills ecosystem

Observability

Report includes counts: ✅ N checks passed | ⚠️ N warnings | ❌ N issues
Per-category breakdown: Code Quality / Security / Agent Patterns
Per-finding: severity, rule number, line number, fix suggestion, reliability tag [P] or [H]
No audit log or replay

Privacy

All analysis runs locally. No external API calls. No telemetry. No code leaves the machine.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

A8 Cross-runtime harness

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A8 Cross-runtime harness

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

A8 Cross-runtime harness

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

A8 Cross-runtime harness

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

A8 Cross-runtime harness

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

A8 Cross-runtime harness

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.

Distribution

Type: skill-pack
License: MIT
Install: one-liner
Version: 1.0.0

Surfaces

CLI binary: No
CLI subcmds: 0
Local UI: No

Components

Commands: 0
Skills: 5
Subagents: 0
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 0
Templates: 0

Workflow

Phases: 8
Approval gates: 0
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: No
Pattern: sequential
Max concurrent: 1
Isolation: none
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: No
BYOK: No
Modal: text

Execution

Mode: one-shot
Crash recovery: No
Compaction: No
Session handoff: No
Streaming: No

Memory

Type: none
Persistence: none
Search: none

Quality

TDD: No
TDD mechanism: none
Validators: 2
Self-review: inline-self

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: No
Audit format: none
Replay: No

Tools

Primary: claude-code
Targets: 5
Portability: high

Signals

Stars: 38
Last commit: 2026-05-26
Maintainer: active
Quality score: 0.4/10