Skip to content
/

AgentLint

agentlint · 0xmariowu/AgentLint · ★ 33 · last commit 2026-05-24

Primitive shape 3 total
Commands 2 Hooks 1
00

Summary

AgentLint — Summary

AgentLint is a harness linter — a CLI tool and Claude Code plugin that scores agent configuration files (CLAUDE.md, AGENTS.md, .cursor/rules/, CI configs, hooks, .gitignore) across 58 deterministic checks in 6 dimensions (Findability, Instructions, Workability, Continuity, Safety, Harness) plus 7 opt-in AI-powered checks.

Problem it solves: Agent harnesses (the files that wrap an LLM and turn it into a coding agent) are routinely misconfigured — instruction files too long, hook event names misspelled, secrets in .gitignore blind spots, pre-commit hooks too slow for Claude Code, missing handoff files. AgentLint quantifies harness quality with a 0-100 score backed by empirical data from 265 versions of Anthropic's Claude Code system prompt and analysis of 492 public CLAUDE.md files.

Distinctive trait: Every check cites a source (Anthropic system prompt changelog, Claude Code source code, academic papers, production audits). The Harness dimension checks Claude Code-specific behaviors that no other tool validates — e.g., H1 (hook event name typos silently prevent hooks from firing), H3 (Stop hooks without circuit breakers run forever), H4 (wildcard auto-approve grants unlimited tool execution).

Target audience: Development teams running Claude Code, Cursor, or Codex who want a measurable, evidence-backed quality score for their agent harness, with a guided fix plan.

Production-readiness: Active (v1.1.13, last commit 2026-05-24), MIT license, 33 stars. Ships as npm package with 4 binary aliases (agentlint, agentlint-ai, agent-lint, al-scan) and a /al Claude Code slash command.

Differs from seeds: Most similar to spec-kit (CLI tool that checks agent configurations) but AgentLint audits the harness rather than the workflow. Unlike spec-kit's command/skill mirror pattern enforcing spec-driven development, AgentLint is a static analyzer with 58 evidence-backed checks against harness quality dimensions. The Session extended checks read Claude Code session logs — a capability unique in the corpus.

01

Overview

AgentLint — Overview

Origin

AgentLint was created by 0xmariowu and framed around Mitchell Hashimoto's February 2026 formulation: "Agent = Model + Harness." The tool was built after analyzing 265 versions of Anthropic's Claude Code system prompt, reading Claude Code source code for hard limits, and running audits against 492 public CLAUDE.md files.

Philosophy

From the README:

"ESLint was for the code humans wrote. AgentLint is for the context agents read."

"Agent = Model + Harness. If you're not the model, you're the harness."

"LangChain's February 2026 report: 70% of agent performance lives outside the model. Same weights, different harness, different results."

"A bad harness is worse than no harness. And almost nobody knows what a good one looks like. AgentLint is the first linter for the harness itself."

Evidence-First Methodology

Every check is backed by empirical data:

  • 265 versions of Claude Code's system prompt tracked word-by-word (e.g., "When they cut IMPORTANT from 12 uses to 4, we knew.")
  • Claude Code source code for hard limits (40K char truncation, 256KB file read limit, pre-commit hook timeout behavior)
  • Real production audits of open-source codebases
  • 6 academic papers on instruction compliance, context-file effectiveness, documentation decay

Key data points from the README:

  • Median CLAUDE.md compliance across 492 files: 3/12 rules
  • Perfect (12/12) scores: 0
  • Most-missed rule: "don't edit out of scope" — 98% missed
  • ETH Zurich: auto-generated context files reduce agent success in 5/8 settings

Check Dimensions

Dimension Weight Checks Focus
Instructions 25% 8 Rule quality, length, emphasis density
Findability 20% 9 Can AI find required files?
Safety 15% 9 Secrets, CI, supply chain
Workability 18% 11 Build/test commands, CI, file sizes
Continuity 12% 6 Handoff, changelog, cross-session
Harness 10% 8 Hook correctness, permissions
Deep (opt-in) - 3 AI-powered instruction analysis
Session (opt-in) - 4 Session log analysis
02

Architecture

AgentLint — Architecture

Distribution

  • Type: npm package + Claude Code plugin (opt-in)
  • Package: agentlint-ai on npm (v1.1.13)
  • License: MIT

Install Methods

npm install -g agentlint-ai        # CLI only — no Claude plugin yet
npx agentlint-ai install           # opt-in: register /al Claude Code plugin

CLI Binaries (4 aliases)

Binary Source
agentlint scripts/agentlint.sh
agentlint-ai postinstall.js
agent-lint scripts/agentlint.sh
al-scan src/scanner.sh

Required Runtime

  • Node.js
  • jq (JSON processing)
  • Optional: Claude Code (for Deep/Session extended checks)

Directory Structure

AgentLint/
├── .claude-plugin/          # Claude plugin manifest
├── commands/
│   ├── al.md               # /al slash command (interactive scan-fix-report)
│   └── setup.md            # Setup command
├── hooks/
│   └── hooks.json          # SessionStart hook (dependency check)
├── scripts/
│   ├── agentlint.sh        # Main CLI entry point
│   └── (scanner)
├── src/
│   ├── scanner.sh          # al-scan binary
│   └── (check implementations)
├── standards/              # Documentation of checks and evidence
├── templates/              # Report templates
├── tests/                  # Test suite
├── docs/                   # GitBook documentation
└── package.json

Hook Events

Event Matcher Purpose
SessionStart (none) Check if jq and node are installed; warn if missing dependencies

Single hook — only a readiness check at session start, not an enforcement hook.

Plugin Data

Config stored at ${CLAUDE_PLUGIN_DATA}/config.json:

{
  "projects_root": "~/Projects",
  "modules": { "core": true, "deep": false, "session": false }
}

Target AI Tools

  • Claude Code (primary — via /al slash command and plugin)
  • Codex and Cursor (via CLI agentlint check)
  • GitHub Actions (via action.yml)
03

Components

AgentLint — Components

CLI Binaries (4)

Binary Purpose
agentlint Main scan CLI — runs 51 core checks, outputs score + fix plan
agentlint-ai npm package entry; triggers postinstall.js for plugin registration
agent-lint Alias for agentlint
al-scan Scanner-specific binary for focused scanning

Commands (2 Claude Code slash commands)

Command File Purpose
/al commands/al.md Interactive scan-fix-report flow: module selection → project detection → run checks → guided/assisted fix plan → HTML report
/al setup commands/setup.md First-run setup: detect projects root, register config

Hooks (1)

Event Purpose
SessionStart Dependency readiness check: verify jq and node are available; print readiness message

GitHub Action

action.yml — runs core 51 checks in CI on every PR that modifies CLAUDE.md / AGENTS.md. Reports compliance score in GitHub Actions summary.

Check Catalog (58 total)

Findability (F1–F9): Entry file exists, project description in first 10 lines, conditional loading guidance, large directory INDEX, resolved references, standard naming, @include resolution, glob frontmatter, no unfilled template placeholders.

Instructions (I1–I8): Emphasis keyword count, keyword density, rule specificity, action-oriented headings, no identity language, length 60-120 lines, under 40K chars, total injected content within 200K budget.

Workability (W1–W11): Build/test commands documented, CI exists, tests exist (non-empty), linter configured, no files over 256KB, fast pre-commit hooks, local fast test command, npm test script, release workflow version consistency, test cost tiers, feat/fix commits paired with test commits.

Continuity (C1–C6): Document freshness, handoff file exists, changelog has "why", plans in repo, CLAUDE.local.md not in git, HANDOFF.md has verify conditions.

Safety (S1–S9): .env in .gitignore, Actions SHA pinned, secret scanning configured, SECURITY.md exists, workflow permissions minimized, no hardcoded secrets (pattern match), no personal paths, no pull_request_target, no personal email in git history.

Harness (H1–H8): Hook event names valid, PreToolUse hooks have matcher, Stop hook has circuit breaker, no dangerous auto-approve, env deny coverage complete, hook scripts network access, gate workflows are blocking, hook errors use structured format.

Deep (opt-in, 3): AI-powered instruction contradiction detection, dead weight identification, vague rule analysis.

Session (opt-in, 4): Claude Code session log analysis for recurring issues, missed checks, tool abuse patterns, session efficiency.

05

Prompts

AgentLint — Prompts

Prompt 1: /al Command (commands/al.md)

Technique: Step-numbered interactive protocol with pre-selected defaults, explicit bash variable management for config state, AskUserQuestion interaction pattern.

---
description: "Run AgentLint diagnostic across all projects. Use when: user says /al, 'check all projects', 'agent lint', or '体检'."
allowed-tools: Bash(*), Read(*), Write(*), Edit(*), Glob(*), Grep(*), Agent(*)
---

# /al — AgentLint

Diagnose, plan, fix. One command. User presses Enter twice at most.

## Flow

### Step 1: Module Selection

AskUserQuestion with **defaults pre-selected** (user can press Enter to accept):

AgentLint — which checks to run?

Core (deterministic, no AI calls) — default ON: ☑ Findability — can AI find what it needs? ☑ Instruction Quality — are your rules well-written? ☑ Workability — can AI build and test? ☑ Continuity — can next session pick up? ☑ Safety — are secrets and CI locked down? ☑ Harness — are Claude Code hooks/permissions safe?

Extended (opt-in, runtime-dependent): ☐ Deep Analysis — find contradictions, dead weight, vague rules (uses AI) ☐ Session Analysis — discover issues from your Claude Code session logs

[Enter to run with defaults]


**Default: all 6 core dimensions.** Extended analyzers are optional and will
show as `n/a` in the output unless explicitly checked.

Record the normalized choices in shell variables for the config write in
Step 2. Core is currently all-or-nothing and defaults on; Deep/Session are
the only runtime-selectable modules.

```bash
RUN_CORE=true
RUN_DEEP=false     # set true only if Deep Analysis was selected
RUN_SESSION=false  # set true only if Session Analysis was selected

## Prompt 2: Harness Checks H1–H8 (from README documentation)

**Technique:** Evidence-backed rule table with "What/Why" pairs; each check has a concrete failure mode. The H-dimension checks are Claude Code-specific:

H1: Hook event names valid What: Check for typos in PreToolUse, PostToolUse, SessionStart, Stop, etc. Why: "PoToolUse" vs "PostToolUse" — typos silently prevent hooks from ever firing

H3: Stop hook has circuit breaker What: Stop hooks must have an exit condition Why: Stop hooks without an exit condition run forever

H4: No dangerous auto-approve What: Check permissions for * or .* patterns Why: * or .* grant unlimited tool execution with no human check

H8: Hook errors use structured format What: Hook error output must include what/rule/fix fields Why: "what/rule/fix" lets the agent self-correct; unstructured errors leave it stuck


## Prompt 3: Check I6 (Instruction Length)

**Technique:** Data-backed constraint with source citation and specific numeric target.

I6: Entry file length What: 60–120 lines is the sweet spot. Longer dilutes priority. Source: Tracked across 265 versions of Claude Code system prompt — optimal compression range

I7: Under 40,000 characters What: Claude Code hard limit. Above this, your file is truncated — silently. Source: Claude Code source code


09

Uniqueness

AgentLint — Uniqueness

Differs from Seeds

AgentLint is closest to spec-kit (CLI tool for agent configuration quality) but operates at a fundamentally different layer. Spec-kit checks whether a project follows a spec-driven development workflow; AgentLint checks whether the agent's configuration files are correctly structured for reliable operation. Unlike any seed framework, AgentLint has a Harness dimension that specifically validates Claude Code hook configurations — checking for silent failure modes like typo'd event names, missing matchers on PreToolUse, Stop hooks without circuit breakers, and dangerous wildcard auto-approve. The Session extended mode is unique in the corpus: no other framework reads past Claude Code session logs to analyze agent behavior patterns. Unlike ccmemory which adds memory capabilities, AgentLint audits whether existing configuration enables reliable agent operation.

Key Differentiators

  1. Harness-specific checks: H1-H8 validate Claude Code hook configurations at a level no other framework addresses — event name validation, matcher requirements, circuit breaker enforcement.
  2. Session log analysis: Extended Session checks read ~/.claude/ session logs to find recurring issues from actual agent behavior.
  3. Evidence-backed checks: 265 system prompt versions tracked, source code read for hard limits, 492 production audits for empirical baselines.
  4. ETH Zurich finding: Explicitly warns against auto-generated context files (which the study found reduced agent success in 5/8 settings).
  5. 4 CLI aliases: Usability focus — agentlint, agent-lint, agentlint-ai, al-scan.

Positioning

AgentLint is the only framework in the corpus positioned as a "harness linter" — applying software quality tooling concepts (ESLint → AgentLint) to the configuration layer of AI coding agents. It occupies the meta-tooling niche: a tool to assess the quality of other tools' configurations.

Observable Failure Modes

  • False negatives on keyword density: I1 checks IMPORTANT keyword count based on Anthropic's internal count (4 uses), but projects legitimately have different emphasis requirements.
  • Static-only H checks: H6 checks "hook scripts network access" but can't actually execute the hook to verify; relies on pattern matching.
  • Session analysis depends on local logs: The Session module requires Claude Code session logs to be present at ~/.claude/ — unavailable in CI.
  • Evidence dated to May 2026: System prompt analysis reflects 265 historical versions; the 266th version could change optimal parameters.
  • Score gaming: The 12-rule keyword signal detection in the related cc-audit tool is deliberately permissive; a well-formatted nonsense file could score well.

Explicit Antipatterns

  • * or .* auto-approve in Claude Code permissions (unlimited tool execution)
  • Hook event name typos (PoToolUse vs PostToolUse)
  • Stop hooks without circuit breakers
  • IMPORTANT keyword overuse (diminishing compliance returns)
  • Auto-generated context files from AI tools
  • Files over 40K characters (silently truncated)
  • Pre-commit hooks that take >30s (Claude Code never uses --no-verify)
04

Workflow

AgentLint — Workflow

Interactive /al Flow

1. Module Selection (user presses Enter to accept defaults)
   ☑ Core checks (F/I/W/C/S/H — always on)
   ☐ Deep Analysis (AI sub-agents — opt-in)
   ☐ Session Analysis (Claude Code logs — opt-in)

2. First-Run Init
   "Where are your projects? [~/Projects]:" → saves to config.json

3. Project Discovery
   Scan projects_root for repos with CLAUDE.md / AGENTS.md

4. Run Checks
   For each project:
   - Run 51 deterministic checks across 6 dimensions
   - (If Deep) Spawn AI subagents for instruction analysis
   - (If Session) Read ~/.claude/session logs

5. Score + Report
   Score: NN/100 (core) — weighted dimension average
   Fix Plan: guided items + assisted items

6. Interactive Fix
   User selects items → AgentLint applies fixes → re-scores → saves HTML report

Phase-to-Artifact Map

Phase Artifact
Scan Score + dimension breakdown (stdout)
Fix Plan Ordered list of guided/assisted items
HTML Report Saved to project directory (path from config)
Config ${CLAUDE_PLUGIN_DATA}/config.json

Approval Gates

Gate Trigger Type
Module selection User interaction choice-list (defaults pre-selected)
Projects root First run freetext-clarify (default accepted on Enter)
Fix item selection Per fix choice-list

CI Integration

The GitHub Action runs only core checks (agentlint check) — no AI sub-agents in CI. Exits 0 on pass, 1 on warning, 2 on critical failure (leaked secrets).

Fix Categories

  • [guided] — AgentLint explains the fix, user applies manually
  • [assisted] — AgentLint writes the fix directly (e.g., generates HANDOFF.md)

Score Formula

Total = weighted average of dimensions that actually ran
Core-only run shows "Score: NN/100 (core)" — Extended dims show n/a, never 0
06

Memory Context

AgentLint — Memory & Context

State Storage

Storage Path Purpose
Plugin config ${CLAUDE_PLUGIN_DATA}/config.json Projects root path, module selections (core/deep/session)
HTML reports Project directory (configurable) Per-scan audit report with score + findings

Persistence Model

  • Plugin-scoped: config.json persists across sessions — user only needs to set projects root once.
  • Report-scoped: HTML reports are written per scan to the project directory.
  • No cross-session audit log: Each /al run is independent; historical score tracking is not built in.

Context Injection

At SessionStart, AgentLint's single hook runs a dependency readiness check (jq and node available). This is informational, not injective — no context is added to the session.

Session Log Access (opt-in)

The Session extended module reads Claude Code session logs from ~/.claude/ to find recurring issues, tool abuse patterns, and session efficiency signals. This is the only framework in the corpus that analyzes past agent behavior logs rather than current configuration files.

Compaction Handling

Not applicable — AgentLint runs as a discrete check process, not a continuous agent.

Cross-Session Handoff

Implicit via HTML reports saved to project directories — the next session can reference the most recent report to understand current harness quality.

07

Orchestration

AgentLint — Orchestration

Multi-Agent

Yes — in the Deep extended mode. The /al command can spawn AI subagents to perform instruction analysis (contradiction detection, dead weight identification, vague rule analysis). These are spawned via Claude Code's Agent() tool (allowed in the command's allowed-tools).

Orchestration Pattern

Sequential — core checks run first (deterministic), then Deep AI subagents if opted in, then Session analysis if opted in. No parallelism within a single scan.

Execution Mode

Interactive-loop — the /al command drives a multi-step interactive session (module selection → scan → fix → re-score). The CLI agentlint check is one-shot.

Multi-Model

No — all checks use the same Claude Code session or no LLM at all (51 core checks are fully deterministic bash/node scripts).

Isolation Mechanism

None — AgentLint reads files in place, does not modify repos except through the assisted fix plan.

Consensus Mechanism

None.

Prompt Chaining

Yes — in the interactive /al flow: module selection output → config write → project scan → fix plan → interactive fix → re-score. Each stage's output is input to the next.

Crash Recovery

No explicit crash recovery. Config persists, so the next run picks up the same projects root.

Streaming Output

No — outputs the full score table at once.

08

Ui Cli Surface

AgentLint — UI / CLI Surface

CLI Binaries (4)

Binary Purpose
agentlint agentlint check — run 51 core checks, output score
agentlint-ai npm package entry for plugin registration
agent-lint Alias for agentlint
al-scan Focused scanner

CLI Output Format

$ agentlint check

AgentLint — Score: 72/100 (core)

Findability      ██████████████░░░░░░  7/10
Instructions     ████████████████░░░░  8/10
Workability      ████████████░░░░░░░░  6/10
Safety           ██████████░░░░░░░░░░  5/10
Continuity       ██████████████░░░░░░  7/10
Harness          ████████████████████  10/10
Deep             ░░░░░░░░░░░░░░░░░░░░  n/a
Session          ░░░░░░░░░░░░░░░░░░░░  n/a

Fix Plan (7 items):
  [guided]   Pin 8 GitHub Actions to SHA (supply chain risk)
  [guided]   Add .env to .gitignore (AI exposes secrets)
  [assisted] Generate HANDOFF.md
  [guided]   Reduce IMPORTANT keywords (7 found, Anthropic uses 4)

/al Claude Code Command

Interactive TUI-like flow within Claude Code:

  • Pre-selected checkbox UI for module selection
  • Interactive fix selection (user selects which items to address)
  • Re-scores after fixes
  • Saves HTML report to project directory

GitHub Actions Integration

action.yml — CI check on PR/push for CLAUDE.md/AGENTS.md changes:

  • Outputs: score, rules-hit, leaked-secrets, status
  • Exits 0/1/2 based on pass/warn/fail

Website

Documentation at agentlint.app with 20+ long-form guides and the full check catalog. Blog posts include "Writing a Good CLAUDE.md", "The 33-Check Catalog", "AGENTS.md vs CLAUDE.md".

No Local Web Dashboard

All output is CLI text + saved HTML reports. No localhost server.

Related frameworks

same archetype · same primary tool · same memory type

OpenHarness ★ 13k

Open-source Python agent runtime providing complete harness infrastructure: tools, memory, governance, swarm coordination, and…

Trae Agent ★ 12k

Research-friendly open-source CLI coding agent by ByteDance, designed for academic ablation studies and modular LLM provider…

Sweep AI ★ 7.7k

Autonomous GitHub bot that converts issues to pull requests using a sequential multi-agent pipeline.

Agent Governance Toolkit (microsoft) ★ 2.3k

Enterprise-grade AI agent governance: YAML policy enforcement, 12-vector prompt injection defense, zero-trust identity,…

TDD Guard ★ 2.1k

Mechanically enforces the Red-Green-Refactor TDD cycle by blocking file writes that violate TDD principles via a PreToolUse hook…

Agentic Coding Flywheel Setup (ACFS) ★ 1.5k

Take a complete beginner from laptop to three AI coding agents running on a VPS in 30 minutes via an idempotent manifest-driven…