Skip to content
/

Aurite Agent Verifier

aurite-agent-verifier · aurite-ai/agent-verifier · ★ 38 · last commit 2026-05-26

Primitive shape 5 total
Skills 5
00

Summary

Aurite Agent Verifier — Summary

Agent Verifier is a coding agent skill that runs automated verification on AI-generated code, specifically targeting agent-specific failure patterns (hallucinated tools, unbounded loops, missing retry limits) alongside security gaps and language-specific quality checks. It is installed once via npx skills add aurite-ai/agent-verifier and thereafter triggered by natural-language commands like "verify agent" or "audit agent."

The skill is structured as a modular orchestrator that dispatches to four focused sub-skills: verify-security, verify-patterns, verify-quality, and verify-language. Checks are classified as either pattern-matched (deterministic, same result every run) or heuristic (judgment-based, may vary). The tool is explicitly not a replacement for linters — it catches what static analysis tools cannot: context-size analysis, tool-registry consistency, and LangGraph cycle detection.

All analysis runs locally; no code leaves the user's machine. The skill supports Claude Code, Cursor, Windsurf, Roo Code, Codex, and 30+ other agents.

Compared to seeds: closest to spec-kit's pre-ship validation gates, but agent-verifier focuses exclusively on correctness-of-AI-agent-code (loops, tool hallucinations, context limits) rather than spec compliance. Unlike superpowers' verification-before-completion (which is a process skill), agent-verifier applies deterministic check patterns derived from framework detection (LangGraph, CrewAI, AutoGen, LangChain).

01

Overview

Aurite Agent Verifier — Overview

Origin

Built by Aurite AI (aurite.ai), a startup focused on enterprise-grade secure agent infrastructure. The tool was released as open source (MIT) as a community offering; enterprise capabilities (shared context pools, administrative controls, centralized hosting) are available commercially.

Philosophy

AI coding agents are powerful but skip linting, ignore security basics, and hallucinate tool calls. Code reviews catch some of this, but not consistently. Agent Verifier acts as an automated reviewer that specifically checks for patterns no static analyzer can detect: AI agent anti-patterns that only exist because code is generated by an AI operating in an agent loop.

"AI coding agents are powerful — but they skip linting, ignore security basics, and hallucinate tool calls."

"Agent Verifier is an AI agent skill that acts as an automated reviewer."

Two-Tier Reliability Model

The skill introduces a formal distinction between check types:

  • Pattern-matched checks [P]: Mechanical rule applied to code structure — same answer on every run (high reliability)
  • Heuristic checks [H]: Requires interpretation of intent or quality — best-effort (may vary)

This explicit reliability tagging lets users understand which findings are authoritative vs. advisory.

Supported Agent Frameworks

The skill auto-detects the agent framework used:

  • langgraph imports → LangGraph (enables cycle analysis)
  • crewai imports → CrewAI
  • autogen imports → AutoGen
  • langchain imports → LangChain
  • Direct SDK usage → Custom agent

Explicit Antipatterns Targeted

  • Hardcoded API keys and secrets
  • Unbounded retry loops (@retry without explicit stop)
  • while True / for {} without break in scope
  • Tool names referenced in prompts but not defined in registry
  • System prompts exceeding recommended token limits
  • Unpinned dependencies
  • TypeScript any types
  • Go errors silently discarded (_ = fn())
  • LangGraph cycles with no reachable END node
02

Architecture

Aurite Agent Verifier — Architecture

Distribution

  • Type: skill-pack (multi-agent skill with orchestrator + 4 sub-skills)
  • License: MIT
  • Install complexity: one-liner via npx skills

Install Commands

# Primary (npx skills CLI)
npx skills add aurite-ai/agent-verifier -a claude-code -a cursor -a <your-agent>

# Then trigger via natural language:
verify agent

Directory Layout

skills/
├── verification/
│   └── SKILL.md          # Orchestrator: full verification suite
├── verify-security/      # Security checks (secrets, deps, input validation)
├── verify-patterns/      # Agent pattern checks (loops, retries, tools, context)
├── verify-quality/       # Code quality (naming, org, documentation)
├── verify-language/      # Language-specific (Python type hints, TS strict, Go errors)
docs/
├── getting-started.md
└── tutorials/
    └── 01-installation.md
tests/
└── fixtures/             # Test directories with known issues for validation
assets/
└── agent_verifier.gif    # Demo animation

Target AI Tools

  • Claude Code (primary)
  • Cursor
  • Windsurf
  • Roo Code
  • Codex
  • 30+ other agents

Required Runtime

No runtime dependencies beyond the AI agent (runs locally via agent's existing capabilities — Read, Glob, Grep tools).

Config Files

  • .kahuna/context-guide.md — optional organizational rules (Kahuna integration)
03

Components

Aurite Agent Verifier — Components

Skills (Sub-skill Architecture)

Skill Trigger Phrase What It Checks
verification "verify agent", "audit agent", "full verification" Orchestrator — runs all 4 sub-skills and consolidates
verify-security "verify agent security" Secrets, dependency pinning, input validation, error exposure, secure defaults
verify-patterns "verify agent patterns" Loop safety, retry limits, tool registry consistency, context size, LangGraph cycles
verify-quality "verify agent quality" Naming conventions, code organization, documentation
verify-language "verify agent language" Python type hints, TypeScript strict mode, Go error handling

Pattern-Matched Checks (Deterministic)

Check Pattern Outcome
Retry limits @retry/@backoff/p-retry/urllib3.Retry without explicit stop/total ❌ Issue
Loop safety while True/for {}/while (true) without break in scope ❌ Issue
Tool registry Tool names in prompts absent from definitions ❌ Issue
Context size len(prompt)/4 vs token thresholds ⚠️ Warning / ❌ Issue
Requirements pinning >=, >, or unpinned deps in requirements.txt/pyproject.toml ❌ Issue
Hardcoded secrets Assignments to API_KEY/SECRET/PASSWORD/TOKEN string literals ❌ Issue
No any types (TS) Unqualified : any annotations ⚠️ Warning
Ignored errors (Go) _ = functionCall() where function returns error ❌ Issue
LangGraph cycles Graph cycles with no reachable END in edge mappings ❌ Issue

Heuristic Checks (Best-Effort)

  • Code organization appropriateness
  • Naming convention consistency
  • Input validation sufficiency
  • Docstring quality/usefulness
  • Tool error handling adequacy

Report Format

# Verification Report
✅ 8 checks passed | ⚠️ 3 warnings | ❌ 2 issues

### By Category
| Category | Pass | Warn | Issue |
| Code Quality | 5 | 1 | 0 |
| Security | 2 | 0 | 1 |
| Agent Patterns | 1 | 2 | 1 |
05

Prompts

Aurite Agent Verifier — Prompt Excerpts

Excerpt 1: Verification Orchestrator (from skills/verification/SKILL.md)

Technique: Natural-language trigger matching + modular sub-skill dispatch

---
name: verification
version: "1.0.0"
description: Full agent verification suite. Runs security, patterns, quality, and language-specific checks. Use when asked to "verify agent", "verify my agent", "audit agent", or "full verification".
---
## When to Use

Trigger this skill when the user asks to:
- **"verify agent"** (primary invocation)
- "verify my agent"
- "audit agent"
- "full verification"
- "verify my code" (when agent patterns are detected)
- "check compliance"

Analysis: The trigger specification is a concrete list of exact phrases, not abstract descriptions. This reduces false-positive skill activation (other skills' descriptions overlap with "verify code"). The version: "1.0.0" field enables future compatibility tracking.


Excerpt 2: Context Discovery Protocol (from skills/verification/SKILL.md)

Technique: Ordered priority scan with framework-specific branch logic

### Step 1: Context Discovery

Scan the project to identify:

1. **Primary language:**
   - Check for `pyproject.toml`, `package.json`, `go.mod`
   - Look at file extensions in `src/` or project root

2. **Agent framework (if any):**
   - `langgraph` in imports → LangGraph
   - `crewai` in imports → CrewAI
   - `autogen` in imports → AutoGen
   - `langchain` in imports → LangChain
   - Direct SDK usage → Custom agent

3. **Kahuna integration:**
   - Check if `.kahuna/` directory exists
   - If yes, read `.kahuna/context-guide.md` for organizational rules

Record the detected context for reporting.

Analysis: Framework detection is pattern-matching on file system artifacts and import statements, not heuristic. The ordered scan creates a deterministic priority for language and framework identification. Framework detection gates which checks fire (e.g., LangGraph cycles only checked if LangGraph detected).


Excerpt 3: Pattern-Matched vs Heuristic Classification (from README)

Technique: Explicit reliability taxonomy embedded in output

| Tier | Tag | How it's applied | Reliability |
|------|-----|-----------------|-------------|
| Pattern-matched | `[P]` | Mechanical — rule applied exactly as specified to code structure | High — same answer on every run |
| Heuristic | `[H]` | Judgment — requires interpretation of intent or quality | Best-effort — may vary |

**Pattern-matched checks** (reliable):
| Check | What it looks for |
| Retry limits | `@retry`/`@backoff`/`p-retry`/`urllib3.Retry` without explicit stop/total parameter → ❌ Issue |
| Loop safety | `while True`/`for {}`/`while (true)` without `break` in scope → ❌ Issue |
| Tool registry | Tool names referenced in prompts but absent from definitions → ❌ Issue |

Analysis: Explicitly communicates confidence levels to the user. This "reliability contract" prevents over-trust of heuristic findings and validates pattern-matched findings as authoritative. The distinction is built into the output format, not just documentation.

09

Uniqueness

Aurite Agent Verifier — Uniqueness & Positioning

Differs From Seeds

Closest seed: superpowers (verification-before-completion skill). But superpowers' verification skill is a process guide — it tells the agent how to verify work (run tests, check linting, etc.). Agent Verifier is a content checker — it applies specific rules to detect specific anti-patterns in AI-generated code. Unlike superpowers, agent-verifier introduces a formal reliability taxonomy (pattern-matched [P] vs heuristic [H]) and targets AI-agent-specific failure modes (hallucinated tools, unbounded loops) that no seed framework explicitly covers.

Also distinct from spec-kit (spec compliance validation) and kiro (IDE-level steering hooks). Agent Verifier operates after implementation, not during it, and focuses on agent code specifically rather than general code quality.

Observable Failure Modes

  1. Heuristic variability: [H]-tagged checks may give different results on successive runs — users must know which findings to trust.
  2. No auto-fix: The tool reports but never applies fixes; remediation is entirely manual.
  3. No continuous integration: Must be manually triggered; cannot be wired as a pre-commit hook.
  4. Framework detection may fail: Custom agent patterns not matching known frameworks treated as "Direct SDK usage" — may miss framework-specific issues.
  5. Token context limits: Very large codebases may cause context overflow during Glob/Read phases.

Distinctive Opinion

AI agent code has a distinct failure category not covered by any existing linter: the class of errors that only exist because the code was generated by an AI in an agent loop (hallucinated tool references, context size mismanagement, missing termination in retry/loop patterns). Static analyzers cannot detect these. Dedicated AI-agent verification tooling is a new required layer in the quality stack.

Positioning

Tool Agent Verifier ESLint/Biome Semgrep Manual Review
AI agent patterns Sometimes
Security checks Partial Sometimes
Context-size analysis Rarely
Tool hallucination detection Rarely
Runs inside AI agent N/A
04

Workflow

Aurite Agent Verifier — Workflow

Verification Phases

Phase Action Artifact
1. Invocation User says "verify agent" or variant Trigger
2. Context Discovery Scan project: detect primary language, agent framework, Kahuna integration Detection context
3. Security Checks Run verify-security sub-skill Security findings
4. Pattern Checks Run verify-patterns sub-skill Pattern findings
5. Quality Checks Run verify-quality sub-skill Quality findings
6. Language Checks Run verify-language sub-skill Language findings
7. Consolidation Aggregate findings, tag each as [P] or [H] Unified report
8. Output Structured report with pass/warn/issue counts by category Verification Report

Context Discovery Details

  1. Primary language detection: Check pyproject.toml, package.json, go.mod; look at file extensions in src/ or project root
  2. Agent framework detection: Scan imports for langgraph, crewai, autogen, langchain
  3. Organizational rules: Check if .kahuna/ directory exists; if yes, read .kahuna/context-guide.md

Approval Gates

None — the tool produces a report and stops. Remediation is at the user's discretion. No automatic fix application.

Manual Triggering Modes

Command Scope
"verify agent" Full suite (all 4 sub-skills)
"verify agent security" Security only
"verify agent patterns" Agent patterns only
"verify agent quality" Quality only
"verify agent language" Language-specific only
"verify this code for agent patterns" Targeted on specific directory/files

Fixtures for Testing

tests/fixtures/ contains directories with known issues (e.g. infinite_loop/) for validating the skill's detection accuracy.

06

Memory Context

Aurite Agent Verifier — Memory & Context

State Storage

No persistent memory. Each verification is a fresh scan of the current project state.

Context Built Per Invocation

  1. Language detection result
  2. Agent framework detection result
  3. Organizational rules from .kahuna/context-guide.md (if present)
  4. File scan results (Read, Glob, Grep tools)

Cross-Session Handoff

None. The tool is stateless between invocations.

Audit Output

The verification report is output to the conversation only — not written to a file unless the user explicitly asks. No VERIFICATION.md or similar artifact is created automatically.

Memory Type

None — the skill relies entirely on the agent's native file-reading capabilities for context. No external state store.

07

Orchestration

Aurite Agent Verifier — Orchestration

Multi-Agent Pattern

Pattern: hierarchical (orchestrator dispatches to 4 sub-skills sequentially)

The verification skill acts as an orchestrator. It runs context discovery, then serially dispatches to:

  1. verify-security
  2. verify-patterns
  3. verify-quality
  4. verify-language

Each sub-skill is a separate SKILL.md file loaded by the orchestrator. The orchestrator consolidates all findings into a unified report.

Isolation Mechanism

None — the skill reads in-place. No worktrees or containers.

Multi-Model

No. All verification runs in the primary agent (Claude Code, Cursor, etc.). No external model API calls.

Execution Mode

One-shot — triggered by user phrase, runs to completion, outputs report. No daemon, no loop.

Subagent Definition Format

The sub-skills (verify-security, verify-patterns, etc.) are loaded as skills by the orchestrator, but they are not subagents spawned in separate contexts — they are sequential skill-loading instructions within the same agent session.

Crash Recovery

None — stateless tool; no partial state to recover.

08

Ui Cli Surface

Aurite Agent Verifier — UI / CLI Surface

CLI Binary

No dedicated CLI. Install via npx skills add aurite-ai/agent-verifier. No binary is produced; skills files are written to the agent's skills directory.

UI / Dashboard

None. Output is a structured markdown report in the agent conversation.

IDE Integration

Installed as a skill available to:

  • Claude Code
  • Cursor
  • Windsurf
  • Roo Code
  • Codex
  • 30+ other agents supporting the skills ecosystem

Observability

  • Report includes counts: ✅ N checks passed | ⚠️ N warnings | ❌ N issues
  • Per-category breakdown: Code Quality / Security / Agent Patterns
  • Per-finding: severity, rule number, line number, fix suggestion, reliability tag [P] or [H]
  • No audit log or replay

Privacy

All analysis runs locally. No external API calls. No telemetry. No code leaves the machine.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.