Skip to content
/

Vet (Verify Everything)

vet-imbue · imbue-ai/vet · ★ 385 · last commit 2026-05-13

Primitive shape 1 total
Skills 1
00

Summary

Vet (Verify Everything) — Summary

Vet is a standalone, BYOK code-verification CLI (binary: vet, PyPI package: verify-everything) that uses LLM analysis to find issues in git diffs that linters and test suites miss. It ships a companion agent skill that installs into Claude Code, Codex, OpenCode, and Gemini CLI simultaneously, triggering automatically after every logical unit of code change. Unlike most quality tools that check syntax or style, Vet cross-examines the diff against the original stated goal and the agent's conversation history to catch intent-implementation mismatches — the case where code changed but not in the direction the user actually wanted. It supports BYOK via OpenAI-compatible endpoints, ships an --agentic mode that routes analysis through an installed agent harness instead of raw API calls, and publishes findings as GitHub PR review comments via a reusable GitHub Actions workflow. Exit code semantics (0 = clean, 10 = issues found, 1 = runtime error, 2 = usage error) are explicitly designed for CI and agent tool-loops.

Differs from seeds: Closest to superpowers (skills-only behavioral framework, zero commands, activates proactively). Key delta: Vet is a verification artifact — it produces a machine-readable JSON/text issue list — whereas superpowers produces behavioral Iron Laws. Vet also ships a real CLI binary that CI can invoke independently; superpowers has no binary. Most distinct from kiro and claude-flow which manage workflow phases; Vet inserts only at the verification gate.

01

Overview

Vet — Origin, Philosophy, and Manifesto

Origin

Built by Imbue AI (imbue.com), a lab focused on building AI systems with genuine reasoning capabilities. Vet emerged from the problem of AI coding agents confidently implementing the wrong thing — passing all tests while misreading user intent or introducing subtle logic errors. Released under AGPL-3.0, publicly available on PyPI as verify-everything.

Philosophy

"Vet reviews intent and code: checks agent conversations for goal adherence and code changes for correctness."

"Run vet immediately after ANY logical unit of code changes. Do not batch your changes, do not wait to be asked to run vet, make sure you are proactive."

The core opinion: tests and linters verify mechanical correctness; Vet verifies semantic intent. The SKILL.md explicitly instructs agents to not batch changes and not wait to be asked — treating verification as a continuous obligation, not a terminal step.

Multi-harness philosophy

Vet's install script deposits the same skill file into .agents/, .opencode/, .claude/, and .codex/ simultaneously. The stated goal is to work identically across all major coding agents so the user's choice of harness doesn't degrade verification quality.

Bring-Your-Own-Model

Vet explicitly rejects the model-lock pattern. Custom model definitions via OpenAI-compatible endpoints are supported through a JSON config file at $XDG_CONFIG_HOME/vet/models.json or .vet/models.json. A --update-models flag fetches community model definitions from a remote registry, decoupling model support updates from CLI version bumps.

Explicit design choice: argparse over click/typer

From the CLI source comments: "The choice to use argparse was primarily driven by the idea that vet will be called by agents / llms. Given this, we want to have the most standardized outputs possible." — a rare explicit statement of agent-first CLI design.

02

Architecture

Vet — Architecture, Distribution, and Installation

Distribution

  • PyPI package: verify-everything (pip/pipx/uv install)
  • GitHub Action: imbue-ai/vet@main (reusable workflow for PR review)
  • Agent skill: installed via install-skill.sh or manual curl loop into .claude/skills/vet/, .agents/skills/vet/, .opencode/skills/vet/, .codex/skills/vet/

Version analyzed: 0.2.12

Installation

# CLI
pip install verify-everything
pipx install verify-everything
uv tool install verify-everything

# Agent skill (project level)
curl -fsSL https://raw.githubusercontent.com/imbue-ai/vet/main/install-skill.sh | bash

# GitHub Actions (PR review bot)
# see .github/workflows/vet.yml with uses: imbue-ai/vet@main

Directory Tree

vet/
├── skills/
│   └── vet/
│       ├── SKILL.md            # Agent skill file (multi-harness)
│       └── scripts/
│           ├── export_claude_code_session.py
│           ├── export_codex_session.py
│           ├── export_opencode_session.py
│           └── export_gemini_cli_session.py
├── vet/                        # Python package
│   ├── cli/
│   │   ├── main.py             # argparse-based CLI (agent-first design)
│   │   └── config/
│   ├── issue_identifiers/      # issue detection logic
│   ├── imbue_core/             # LLM client abstraction
│   └── vet_types/
├── registry/
│   └── models.json             # community model definitions
├── action.yml                  # GitHub Actions reusable workflow
├── pyproject.toml              # entry: vet = "vet.cli.main:main"
└── .vet/                       # local config (models.json override)

Required Runtime

  • Python >= 3.11
  • Git (for diff computation)
  • At least one: ANTHROPIC_API_KEY, OPENAI_API_KEY, or OpenAI-compatible endpoint, OR an installed agent harness (claude/codex/opencode) when using --agentic

Target AI Tools

  • Claude Code (primary — SKILL.md installs to ~/.claude/skills/)
  • Codex CLI (~/.codex/skills/)
  • OpenCode (~/.opencode/skills/)
  • Gemini CLI (~/.gemini/skills/ — via export script)
  • GitHub Actions CI (via reusable workflow)

Config Files

  • $XDG_CONFIG_HOME/vet/models.json — custom model definitions
  • .vet/models.json — per-repo model override
  • Supports --config flag for preset selection
03

Components

Vet — Components

CLI Binary: vet

Single-binary CLI with no subcommands — all configuration is via flags.

Key flags:

  • vet "goal" — run verification against current git diff
  • --base-commit REF — diff base (default: HEAD)
  • --staged — only analyze staged changes
  • --history-loader "cmd" — shell command that outputs agent conversation history
  • --model MODEL — LLM to use (default: claude-opus-4-7)
  • --agentic — route analysis through locally installed agent harness
  • --agent-harness claude|codex|opencode — select harness for --agentic mode
  • --confidence-threshold N — minimum confidence 0.0–1.0 (default: 0.8)
  • --output-format text|json|github — output format
  • --max-spend — spending cap in USD
  • --list-models — list all supported models
  • --update-models — fetch community model definitions from remote registry
  • --list-issue-codes, --enabled-issue-codes, --disabled-issue-codes — selective check control

Exit codes: 0 = clean, 1 = runtime error, 2 = usage error, 10 = issues found.

Agent Skill: vet/SKILL.md

One skill file compatible with Claude Code, Codex, OpenCode, and Gemini CLI. Instruction: run proactively after every logical unit of code changes. Contains:

  • Session export commands per harness (using history-loader scripts)
  • --session-file / --session-id discovery instructions per harness
  • Interpretation notes (filter issues from other agents in same repo)
  • Update instructions when CLI/skill becomes stale

Session Export Scripts (4)

  • export_claude_code_session.py — reads .claude/projects/<encoded-path>/<session-uuid>.jsonl
  • export_codex_session.py — reads ~/.codex/sessions/YYYY/MM/DD/<file>.jsonl
  • export_opencode_session.py — uses opencode session list --format json
  • export_gemini_cli_session.py — reads ~/.gemini/tmp/<project>/chats/

GitHub Action: imbue-ai/vet@main

Reusable GitHub Actions workflow. Inputs: agentic (bool). Handles: Python setup, vet install, merge-base computation, posting review comments to PR. Requires ANTHROPIC_API_KEY as repo secret.

Model Registry: registry/models.json

Community-maintained model definitions for OpenAI-compatible endpoints. Fetched remotely via --update-models flag.

Issue Identifiers Module

vet/issue_identifiers/ — Python modules implementing individual check types (goal adherence, code correctness, etc.). Not enumerated publicly in README; internal implementation detail.

05

Prompts

Vet — Prompt Files and Techniques

Prompt 1: SKILL.md — Proactive Trigger Instruction (Iron Law pattern)

---
name: vet
description: Run vet immediately after ANY logical unit of code changes. Do not batch your changes, do not wait to be asked to run vet, make sure you are proactive.
---

# Vet

**Run vet immediately after ANY logical unit of code changes. Do not batch changes, do not wait to be asked to run vet, make sure you are proactive.**

Vet reviews git diffs and conversation history to find issues in code changes and conversation history. It is most effective when run frequently with conversation history, which helps it catch misunderstandings between what was requested and what was implemented. Despite this, vet is not a replacement for running tests.

Technique: Iron Law with prohibitions. The double-bold instruction ("Do not batch... do not wait") mirrors superpowers' Iron Law pattern where the critical behavior is stated twice and in the negative form ("never", "do not"). The skill description field (used by the agent harness for activation) repeats the trigger condition verbatim.

Prompt 2: Session Discovery Instructions (Verification-loop awareness pattern)

**Claude Code:** Your current session UUID is `${CLAUDE_SESSION_ID}`. Session files are stored in `~/.claude/projects/<encoded-path>/` as `<session-uuid>.jsonl`. Find the session file matching your UUID and verify it belongs to this conversation. If the UUID above was not replaced with an actual value (e.g. older Claude Code versions), fall back to a manual search:
1. Find the most unique sentence / question / string in the current conversation.
2. Run: `grep -rl "UNIQUE_MESSAGE" ~/.claude/projects/` to find the matching session file.
    - IMPORTANT: Verify the conversation you found matches the current conversation and that it is not another conversation with the same search string.
3. Pass the matched file path as `--session-file`.

Technique: Explicit disambiguation protocol. The prompt injects the live session UUID (${CLAUDE_SESSION_ID}) as a template variable, then provides a fallback grep-based search. The IMPORTANT: prefix on the verification warning is a known high-weight instruction marker. This addresses a real failure mode (agent attaches the wrong session file) through step-by-step disambiguation.

Prompt 3: Issue Interpretation Instruction (Scope-limiting pattern)

## Interpreting Results

Vet analyzes the full git diff from the base commit. This may include changes from other agents or sessions working in the same repository. If vet reports issues that relate to changes you did not make in this session, disregard them, assuming they belong to another agent or the user.

Technique: Scope-limiting guardrail. Prevents the agent from acting on false positives caused by multi-agent shared repos. Grounds responsibility attribution — "changes you did not make" establishes that the agent should only own its own session's changes.

09

Uniqueness

Vet — Uniqueness and Positioning

Differs from Seeds

Vet is closest to superpowers in distribution pattern (skills-only, zero commands, proactive activation via skill description). The fundamental delta: Vet produces a machine-checkable verification artifact (JSON issue list with confidence scores) while superpowers produces behavioral instructions. Vet ships a real CLI binary that CI, pre-commit hooks, and human engineers can invoke independently of any agent — superpowers has no such external interface. Compared to spec-kit (Archetype 2: mirror commands + skills with hooks), Vet has no commands and no hooks — just a skill and a binary. Compared to claude-flow (MCP-anchored toolserver), Vet has no MCP layer. The closest analogy across all seeds would be a lightweight insertion at the verification gate only, without touching planning, spec, or commit phases.

Positioning

Vet occupies the narrow but technically interesting position of "goal-intent verifier" — it is not a linter (no syntax rules), not a formatter, not a test runner. It uses an LLM to answer the question "did the code change actually accomplish what was requested?" This positions it as a post-implementation sanity check that can catch the failure mode where all tests pass but the wrong behavior was implemented.

The multi-harness install (Claude Code + Codex + OpenCode + Gemini simultaneously) is unusual — most tools in this corpus are single-harness. Vet explicitly avoids harness lock-in as a design principle.

Observable Failure Modes

  1. Session file mismatch: The history-loader grep can match the wrong session, causing Vet to analyze a different conversation's context. The SKILL.md has explicit disambiguation warnings for this.
  2. Cost blowup on large diffs: No automatic diff size limit mentioned; --max-spend is the only governor.
  3. AGPL copyleft: The AGPL-3.0 license means any product that ships Vet as a service must also open-source their integration. This limits commercial adoption in closed products.
  4. False positives on shared repos: Multi-agent repos where several agents edit the same codebase will cause Vet to flag other agents' changes, requiring manual filtering.
  5. No replay capability: Findings are not persisted; re-running on the same diff may return different results if the model response varies.

Explicit Antipatterns

  • Batching changes before running Vet (contradicts the skill's Iron Law)
  • Using Vet as a replacement for running tests (explicitly stated in SKILL.md: "not a replacement for running tests")
04

Workflow

Vet — Workflow

Usage Modes

Mode 1: Agent-embedded verification (primary)

Phase What happens Artifact
Agent edits code Claude Code / Codex / OpenCode makes changes git diff
Skill auto-triggers vet SKILL.md fires proactively after each logical unit
Session export History-loader script extracts conversation history .jsonl session
LLM analysis Vet snapshots repo, diff + conversation, runs checks
Issue report Filtered, deduplicated issue list text / json output
Agent acts Agent reads findings, decides whether to fix

Mode 2: CLI one-shot

vet "Implement X without breaking Y"
vet "Refactor storage layer" --base-commit main
Phase Artifact
User states goal goal string
Diff computed git diff from --base-commit or HEAD
LLM analysis checks run in parallel per --max-workers
Output text/json/github formatted issue list

Mode 3: CI / GitHub Actions

Phase Artifact
PR opened/updated GitHub event
Action: checkout + vet install
Merge base computed git merge-base
Vet runs against PR diff
PR review posted GitHub PR review comment

Approval Gates

None — Vet is a reporting tool. It does not block or gate. Exit code 10 signals issues found; it is the caller's (CI system or agent) responsibility to decide whether to block.

Confidence Filtering

Issues below --confidence-threshold (default 0.8) are filtered out before output. The --output-fields flag controls which fields appear in output. --max-spend provides a cost ceiling.

06

Memory Context

Vet — Memory and Context

Session History as Context

Vet's key differentiator is reading the agent's conversation history as part of its analysis. It uses four session-export scripts (one per supported harness) to extract the JSONL session file and pass it to the LLM as --history-loader context.

This is file-based ephemeral context: the session file exists on disk during the agent session; Vet reads it per-invocation. There is no persistent memory store.

Disk-based State

  • Git diff is computed fresh each invocation from the live repo state.
  • Config presets can be stored at $XDG_CONFIG_HOME/vet/ or .vet/.
  • Model registry cache is written by --update-models to a local disk location.
  • No SQLite, no vector DB, no cross-session state.

Cross-session Handoff

None. Each Vet invocation is independent. It does not write findings to a durable store that a subsequent session reads.

Context Compaction

Not applicable — Vet does not run inside an agent session that compacts. It is invoked as a subprocess from within the agent session or from CI.

Log Output

--log-file flag writes verbose logs to a user-specified path. Not a structured audit log — this is diagnostic output.

07

Orchestration

Vet — Orchestration

Multi-agent

No. Vet is a single-process verification tool. No subagents, no spawning.

Execution Mode

  • As agent skill: event-driven (fires after code changes)
  • As CLI: one-shot per invocation
  • As GitHub Action: event-driven (PR opened/updated webhook)

Isolation Mechanism

None — Vet reads the repo in-place (git diff), it does not modify anything. No worktree, no container.

Multi-model

Yes, in the sense that the user may choose any model via --model. Vet itself does not route different roles to different models internally. But the --agentic mode delegates to the installed agent harness which may have its own model configuration.

Parallelism

--max-workers flag controls parallel issue identifier execution within a single Vet run. This is intra-process parallelism on issue checking, not multi-agent.

Orchestration Pattern

None (single-tool sequential analysis). Issue identifiers may run in parallel per --max-workers but this is a performance optimization, not a multi-agent pattern.

Consensus Mechanism

None. Single LLM call (or agent harness delegation) per run.

08

Ui Cli Surface

Vet — UI and CLI Surface

Dedicated CLI Binary

Yes. Binary name: vet. Installed via pip install verify-everything. Entry point: vet.cli.main:main.

Vet is a single-command CLI with no subcommands — all behavior is configured via flags. Design choice explicitly stated in code: argparse chosen over click/typer because agents expect standardized outputs.

Key flag groups:

  • Diff options: --base-commit, --staged
  • Context: --history-loader, --extra-context
  • Analysis: --enabled-issue-codes, --disabled-issue-codes, --confidence-threshold
  • Model: --model, --temperature, --agentic, --agent-harness
  • Parallelism: --max-workers, --max-spend
  • Output: --output-format text|json|github, --output-fields, --quiet, --verbose, --log-file

Local UI

None. Vet is terminal/CLI only.

IDE Integration

None directly. Agent skill installs automatically activate in Claude Code, Codex, OpenCode, Gemini CLI.

CI/CD Integration

GitHub Actions reusable workflow (imbue-ai/vet@main). Posts findings as PR review comments via pull-requests: write permission. --output-format github formats output for GitHub code review annotations.

Observability

  • Text/JSON/GitHub output formats
  • --log-file for diagnostic logs
  • Exit codes designed for CI pipelines (10 = issues found enables || true escape hatches)
  • --quiet suppresses all output except findings

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.