Skip to content
/

xiaolai/nlpm-for-claude

nlpm-xiaolai · xiaolai/nlpm-for-claude · ★ 55 · last commit 2026-05-26

Primitive shape 34 total
Commands 8 Skills 17 Subagents 7 Hooks 1 MCP tools 1
00

Summary

xiaolai/nlpm-for-claude — Summary

NLPM (Natural-Language Programming Manager) is a comprehensive linting, scoring, and testing system for AI agent "NL artifacts" — the markdown files that drive AI behavior: skills, agents, commands, rules, hooks, prompts, CLAUDE.md, and memory files. It treats natural language artifacts as programs that can be scored (100-point penalty-based scale), linted (50 rules with named penalties), auto-fixed, tested with NL-TDD specs, and security-scanned.

The system ships 8 slash commands, 7 agent definitions, a standalone Python validator (bin/nlpm-check) for pre-commit hooks and CI, and a self-evolving GitHub Actions pipeline that audits real plugin repos, harvests exemplars, and feeds learnings back into the rule catalog. It uniquely catches manifest-vs-disk consistency — the bug where a SKILL.md exists on disk but is missing from plugin.json.

NLPM is multi-tool tier-aware: it applies a universal set of rules (Tier 1) plus tool-specific overlays for Claude Code (Tier 2-Claude), Codex CLI (Tier 2-Codex), and Antigravity (Tier 2-Antigravity). NL-TDD workflow: write .nlpm-test/my-agent.spec.md before the artifact (red), write artifact, verify (green).

Compared to seeds: no direct equivalent. The closest analogy is spec-kit (linting + validation) but for NL artifacts rather than code. Uniquely designed to police the quality of other frameworks' skill/agent/command files.

01

Overview

xiaolai/nlpm-for-claude — Overview

Origin

Built by xiaolai (Li Xiaolai, prolific Chinese author and developer). Released under ISC. Part of the xiaolai plugin marketplace. Available via Anthropic's official community marketplace (with ~24h lag) and via the xiaolai marketplace directly.

Philosophy

Natural language artifacts (skills, agents, commands, CLAUDE.md) are programs. Just as ESLint scores JavaScript and ruff scores Python, NLPM scores the markdown files that drive AI behavior. The quality of these artifacts directly determines agent reliability and predictability.

"Just as ESLint scores JavaScript and ruff scores Python, NLPM scores the markdown files that drive AI behavior."

"NLPM is the only multi-tool NL artifact validator that systematically checks manifest-vs-disk consistency."

Key Research Finding

The most novel claim: a class of bugs exists where a SKILL.md exists on disk but is silently missing from plugin.json — invisible after claude plugin install. No other validator (including Anthropic's official plugin-validator and Linux Foundation's skills-ref) catches this. NLPM does.

The 50 Rules of Natural Language Programming

Formalized as skills/nlpm/rules/SKILL.md. Examples:

  • R01: No vague quantifiers without criteria ("appropriate", "relevant", "as needed" → penalty -2 each, cap -20)
  • R02: Every line must earn its tokens
  • R03: Positive framing over prohibitions
  • R04: Description is a trigger, not a summary (3+ specific action phrases)
  • R05: Under 500 lines (over 500 = context bloat)
  • R09: <example> blocks are mandatory in agents (minimum 2)
  • R10: Model must match task complexity (haiku = mechanical, sonnet = reasoning, opus = complex judgment)

Self-Evolving Pipeline

The auditor pipeline (GitHub Actions):

  • Audits real plugin repos weekly
  • Repos scoring ≥ 90 produce exemplars under auditor/exemplars/
  • auditor-cite-exemplars.yml opens human-gated PRs adding real-world examples to rule catalog
  • Drift detector validates rule IDs against rubric (found 990 mislabeled rule IDs in a 2026-05-13 sweep)
02

Architecture

xiaolai/nlpm-for-claude — Architecture

Distribution

  • Type: claude-plugin + standalone-repo (with CLI binary)
  • License: ISC
  • Install complexity: one-liner

Install Commands

# Via xiaolai marketplace (latest)
claude plugin marketplace add xiaolai/claude-plugin-marketplace
claude plugin install nlpm@xiaolai --scope project

# Via Anthropic community marketplace
claude plugin marketplace add anthropics/claude-plugins-community
claude plugin install nlpm@claude-community --scope project

# Standalone CLI (no Claude Code required)
curl -fsSL -o /usr/local/bin/nlpm-check \
  https://raw.githubusercontent.com/xiaolai/nlpm/main/bin/nlpm-check
chmod +x /usr/local/bin/nlpm-check

Directory Layout

skills/nlpm/
├── rules/SKILL.md          # 50 rules of NL programming
├── scoring/SKILL.md        # Scoring rubric with penalty tables
├── testing/SKILL.md        # NL-TDD spec format
├── security/SKILL.md       # Security scan skill
├── conventions/SKILL.md    # Universal conventions
├── conventions-claude/     # Claude Code tier-specific conventions
├── conventions-codex/      # Codex tier-specific conventions
├── conventions-antigravity/ # Antigravity tier-specific conventions
├── patterns/               # Pattern library
├── vocabulary/             # Vocabulary drift tracking
├── writing-agents/         # Agent authoring guidance
├── writing-hooks/          # Hook authoring guidance
├── writing-plugins/        # Plugin authoring guidance
├── writing-skills/         # Skill authoring guidance
├── ...

agents/
├── scorer.md               # Scoring agent (Sonnet)
├── checker.md              # Cross-component consistency checker
├── scanner.md              # Artifact scanner
├── security-scanner.md     # Security scan agent
├── tester.md               # NL-TDD test runner
├── vague-scanner.md        # Vague quantifier detector
└── vocab-drift-scanner.md  # Vocabulary drift detector

commands/
├── ls.md, score.md, check.md, fix.md, trend.md, test.md, init.md, security-scan.md, ...

bin/
└── nlpm-check              # Standalone Python 3.11+ validator

hooks/
└── hooks.json              # PostToolUse hook (Write|Edit|MultiEdit)

auditor/
├── exemplars/              # 62 published teaching artifacts
├── scripts/                # Drift detection, rule health, CI scripts
├── audits/                 # Historical audit records
└── findings.jsonl          # Audit findings log

Required Runtime

  • Claude Code (primary, for slash commands)
  • Python 3.11+ (for bin/nlpm-check standalone)
  • No external dependencies for standalone binary
03

Components

xiaolai/nlpm-for-claude — Components

Commands (8 slash commands)

Command Purpose
/nlpm:ls Discover and inventory all NL artifacts in a repo
/nlpm:score Score artifact quality (100-point scale with named penalties)
/nlpm:check Cross-component consistency checks (including manifest-vs-disk)
/nlpm:fix Auto-fix fixable issues
/nlpm:trend Track quality score trends over time
/nlpm:test Run NL artifact tests against spec files (NL-TDD)
/nlpm:init Initialize NLPM for a project
/nlpm:security-scan Scan plugins for security risks in executable artifacts

Agents (7 AI agents)

Agent Model Purpose
scorer sonnet Score NL artifacts on 100-point scale, apply penalty tables
checker sonnet Cross-component consistency (manifest-vs-disk, etc.)
scanner unknown Artifact inventory and classification
security-scanner unknown Security risk detection in executable artifacts
tester unknown NL-TDD test runner
vague-scanner unknown Detect vague quantifiers (R01 violations)
vocab-drift-scanner unknown Detect vocabulary drift (R51)

Hooks

Event Matcher Purpose
PostToolUse Write|Edit|MultiEdit Advise when an NL artifact is written/edited; remind to run /nlpm:score

Standalone CLI

Name Language Purpose
bin/nlpm-check Python 3.11+ (single file, no deps) Pre-commit hook or CI validator; runs deterministic subset including manifest-vs-disk check

Templates

File Purpose
templates/pre-commit-nlpm.sh Drop-in git pre-commit hook
templates/workflows/nlpm-check.yml Drop-in GitHub Actions workflow

Scoring System

  • Starts at 100, penalties subtracted
  • Score 90-100: Excellent (production-ready)
  • Score 80-89: Good (minor gaps)
  • Score 70-79: Adequate (meets threshold)
  • Score 60-69: Weak (below threshold)
  • Score <60: Rewrite (fundamental problems)
  • Default pass threshold: 70 (configurable in .claude/nlpm.local.md)
05

Prompts

xiaolai/nlpm-for-claude — Prompt Excerpts

Excerpt 1: The 50 Rules — Universal Examples (from skills/nlpm/rules/SKILL.md)

Technique: Bad/Good paired examples for every rule

**R01. No vague quantifiers without criteria.** "appropriate", "relevant", "as needed", 
"sufficient", "adequate", "reasonable", "properly", "correctly", "some", "several", "various" 
are meaningless without specifics. Replace with measurable criteria. Penalty: -2 each, cap -20.

Bad: "Use appropriate error handling."
Good: "Return `Result<T, AppError>` from all API handlers. Map errors to HTTP status codes 
via the `From<AppError> for StatusCode` impl."

**R04. Description is a trigger, not a summary.** 3+ specific action phrases matching real 
user queries. "Use when debugging React re-renders, fixing hook dependency arrays, optimizing 
with useMemo" — not "Helpful React skill."

**R10. Model must match task complexity.** haiku = mechanical (parsing, counting). sonnet = 
reasoning (analysis, review). opus = complex judgment (orchestration). Wrong tier wastes 
money or produces weak results.

Analysis: Each rule has a named penalty, a concrete bad example, and a concrete good example. R10 is particularly notable — it encodes a cost-correctness tradeoff as a rule, preventing both over-spending (using opus for parsing) and under-spending (using haiku for complex orchestration).


Excerpt 2: Scorer Agent Instructions (from agents/scorer.md)

Technique: 5-step verification gate before reporting findings

## Do Not Invent Findings

Apply ONLY penalties enumerated in `nlpm:scoring`. Do not invent penalty categories. Before 
reporting any finding, run this 5-step check:

1. **Rubric check** — Does the penalty appear in the `nlpm:scoring` penalty tables for this 
   artifact type? If no, do not report (unless marked...

Analysis: The "5-step check before reporting" is an anti-hallucination gate specifically for the scoring agent. It prevents the scorer from inventing rules or over-applying penalties. This is a meta-level quality gate on the quality checker itself.


Excerpt 3: Manifest-vs-Disk Check Rationale (from README)

Technique: Gap analysis embedded as motivation for the tool's existence

NLPM is the only multi-tool NL artifact validator that systematically checks 
**manifest-vs-disk consistency** — the bug class where a SKILL.md exists on disk but is 
silently missing from `plugin.json` (and therefore invisible after `claude plugin install`). 
Verified across 8+ tools including Anthropic's official `plugin-validator` and the Linux 
Foundation's `skills-ref`. See `analysis/ecosystem-gap.md` for the research.

Analysis: The tool's primary unique value proposition is stated as a named bug class (manifest-vs-disk inconsistency) with a specific mechanism (silently invisible after install), validated against 8 competing tools. The analysis/ecosystem-gap.md file documents the evidence — this is a research-backed positioning claim, not marketing.

09

Uniqueness

xiaolai/nlpm-for-claude — Uniqueness & Positioning

Differs From Seeds

No direct equivalent in the 11 seeds. NLPM occupies a meta-layer: it validates the quality of the artifact types that other frameworks in the corpus produce (skills, agents, commands, hooks, plugin manifests). Conceptually closest to spec-kit (linting + validation) but spec-kit validates code against specs; NLPM validates NL artifacts against a formal rubric of 50 rules.

The manifest-vs-disk consistency check is uniquely positioned as an ecosystem gap — no other tool in the corpus or in the broader ecosystem (including Anthropic's official plugin-validator) catches this class of bug. This makes NLPM the only "meta-linter" for AI agent plugins.

Observable Failure Modes

  1. Score determinism depends on rule adherence: The scorer agent is instructed not to invent findings, but heuristic violations (like "vague quantifiers") involve judgment.
  2. R51 vocabulary drift is opt-in: The most advanced check (vocabulary registry) requires explicit config — may be missed by casual users.
  3. Self-evolving pipeline complexity: The GitHub Actions auditor pipeline is sophisticated — reproducing it in a fork requires understanding the full pipeline.
  4. Marketplace lag: Anthropic community marketplace may be ~24h behind xiaolai marketplace — stale version risk.
  5. ISC license: Minor licensing difference from MIT; may matter for some enterprise contexts.

Distinctive Opinion

Natural language artifacts are programs. They can be linted, scored, tested, and have quality metrics. The quality of skills/agents/commands directly determines agent reliability — and poor NL artifact quality is a systematic, measurable problem that should be treated like code quality.

Self-Referential Feature

NLPM carries an nlpm-badge.json in its own repo ([![Validated by NLPM]...]) — the tool runs on itself and displays its own quality score. This is eating its own dog food at the meta level.

04

Workflow

xiaolai/nlpm-for-claude — Workflow

NL-TDD Workflow

Step Action State
1 Write spec: .nlpm-test/my-agent.spec.md RED — artifact doesn't exist
2 /nlpm:test Fails (artifact missing)
3 Write artifact: agents/my-agent.md Artifact created
4 /nlpm:test Check trigger accuracy, output format, score
5 /nlpm:score Verify quality score ≥ threshold
6 Iterate Fix until GREEN

Standard Validation Workflow

/nlpm:ls        → see all NL artifacts
/nlpm:score     → score them all (or specific path)
/nlpm:check     → cross-component consistency
/nlpm:fix       → auto-fix fixable issues
/nlpm:trend     → track score history

CI/Pre-Commit Workflow (Standalone)

nlpm-check .    # exit 1 on high-confidence findings

No Claude Code dependency. Runs in pre-commit hooks or GitHub Actions.

Self-Evolving Auditor Pipeline

  1. Audit (weekly GitHub Actions): audits real plugin repos in the ecosystem
  2. Score (≥90): repos passing at 90+ produce exemplar teaching artifacts
  3. Cite (auditor-cite-exemplars.yml): opens human-gated PRs adding real-world examples to rules
  4. Drift detection (validate-rule-ids.py): re-validates rule_id in historical audits against rubric and semantic keyword match
  5. Rule health (rule-health.py): reports validated_hits and exemplars_count per rule

Approval Gates

  • /nlpm:fix produces auto-fixes; human reviews before applying
  • Auditor citation PRs require human approval before merging
  • NL-TDD spec files require passing tests before artifact is considered complete
06

Memory Context

xiaolai/nlpm-for-claude — Memory & Context

State Storage

File-based, project-scoped. Score trends and audit history written to disk.

Score Trend Storage

/nlpm:trend tracks score history over time. Stored in project-level files (exact path unknown from public sources — likely .claude/nlpm/ or similar).

Audit History

auditor/audits/ — historical audit records per repo. auditor/findings.jsonl — per-finding log.

Exemplar Library

auditor/exemplars/ — 62 published teaching artifacts as of v0.8.17+. Used to add real-world positive references to rule documentation.

Config File

.claude/nlpm.local.md — project-level NLPM configuration:

  • Default pass threshold (default 70)
  • Rule overrides (suppress, max_penalty, threshold adjustments)
  • R51 vocabulary drift opt-in (rule_overrides.R51.enabled: true)

Vocabulary Registry

skills/nlpm/vocabulary/registry.yaml — vocabulary drift rules registry (optional, only loaded if R51 is enabled).

Cross-Session Handoff

Score trends persist in project files. Rules and scoring rubric are loaded from installed skill files.

07

Orchestration

xiaolai/nlpm-for-claude — Orchestration

Multi-Agent Pattern

Pattern: hierarchical — slash commands dispatch to named agents (scorer, checker, tester, etc.) which load specific sub-skills. The scorer agent loads nlpm:scoring, nlpm:conventions, nlpm:conventions-claude, etc.

Agent Coordination

Each command invokes a specific agent:

  • /nlpm:scorescorer agent (loads scoring + conventions skills)
  • /nlpm:checkchecker agent
  • /nlpm:testtester agent
  • /nlpm:security-scansecurity-scanner agent

Hook-Driven Advisory

PostToolUse hook on Write|Edit|MultiEdit — advises when NL artifacts are modified. Fail-open (exit 0 on error). Advisory only, not blocking.

Self-Evolving Pipeline (GitHub Actions)

GitHub Actions workflows run on schedule:

  • auditor-cite-exemplars.yml — weekly exemplar citation runs
  • validate-rule-ids.py — drift detection on each new audit
  • rule-health.py — rule coverage reporting

Multi-Model

Agents specify model in frontmatter: scorer uses sonnet. Others use agent-level model selection per R10 rule (haiku for mechanical, sonnet for reasoning, opus for complex judgment).

Execution Mode

One-shot (slash commands) + event-driven (PostToolUse hook) + scheduled (GitHub Actions auditor pipeline).

Cross-Tool Support

Three-tier artifact classification:

  • Tier 1 Universal: SKILL.md, AGENTS.md
  • Tier 2-Claude: commands, agents, skills, hooks, manifests, CLAUDE.md
  • Tier 2-Codex: Codex-specific paths
  • Tier 2-Antigravity: Gemini/Antigravity paths
08

Ui Cli Surface

xiaolai/nlpm-for-claude — UI / CLI Surface

CLI Binary (Standalone)

Yes — nlpm-check (Python 3.11+ single file, no dependencies).

  • Not a thin wrapper: own deterministic validation engine
  • Usage: nlpm-check . [exit 1 on high-confidence findings]
  • Drop-in CI: templates/workflows/nlpm-check.yml for GitHub Actions
  • Drop-in pre-commit: templates/pre-commit-nlpm.sh

Slash Commands (Claude Code)

8 commands: /nlpm:ls, /nlpm:score, /nlpm:check, /nlpm:fix, /nlpm:trend, /nlpm:test, /nlpm:init, /nlpm:security-scan

Score subcommands:

  • /nlpm:score agents/ — score just agents directory
  • /nlpm:score --changed — score only git-changed files

IDE Integration

  • Claude Code: plugin marketplace (primary)
  • Also available in Codex, Antigravity with tier-aware rule overlays

Observability

  • Score trends tracked in project files
  • auditor/exemplars/ — 62 public teaching artifacts
  • auditor/findings.jsonl — persistent audit log
  • GitHub Actions CI pipeline visible on the repo

Dependency

  • Standalone bin/nlpm-check has zero dependencies (Python 3.11+ stdlib only)
  • Full Claude Code plugin requires Claude Code installed

Related frameworks

same archetype · same primary tool · same memory type

alirezarezvani/claude-skills ★ 16k

313+ skills for 12 AI tools covering engineering, marketing, C-level advisory, compliance, research, and finance — all from one…

MoAI-ADK ★ 1.0k

Implements Harness Engineering as a Go-binary-installed Claude Code environment with auto-TDD/DDD methodology selection, 20-event…

REAP (c-d-cc/reap) ★ 41

Prevent context loss, scattered development, and forgotten lessons through a generation-based lifecycle where AI and human…

Codex Harness MCP ★ 7

Gives MCP-capable coding agents a local contract-lifecycle harness with governance audits and explicit completion gates.

meta-agent-teams (jbrahy) ★ 2

Build self-improving AI agent teams via a supervised training loop: specialist agents advise, a meta-agent evolves prompts based…

Browser Harness ★ 14k

Thin, self-healing CDP harness connecting an LLM to the user's real Chrome browser with coordinate-first clicking and…