Skip to content
/

Caliber

caliber · rely-ai-org/caliber · ★ 1.1k · last commit 2026-05-20

Primitive shape 14 total
Skills 8 Hooks 6
00

Summary

Caliber — Summary

Caliber is a CLI tool and agent skill pack that continuously generates and maintains AI context files (CLAUDE.md, .cursor/rules/*.mdc, AGENTS.md, copilot-instructions.md) for any codebase, keeping them accurate as the code evolves. It provides a deterministic 100-point scoring system (caliber score) that evaluates context file quality without any LLM calls, an LLM-driven generation workflow (caliber init / /setup-caliber skill), pre-commit hooks for automatic refresh on every commit, and session-learning hooks that distill agent interaction patterns into CALIBER_LEARNINGS.md.

Problem it solves: Hand-written context files go stale the moment code is refactored — the AI agent hallucinates paths that no longer exist, misses new dependencies, and gives advice based on yesterday's architecture. Caliber automates the full create-score-refresh loop so context files stay accurate without manual maintenance.

Distinctive trait: The deterministic scoring system — 100 points across 6 categories (Files & Setup 25, Quality 25, Grounding 20, Accuracy 15, Freshness 10, Bonus 5) — evaluates context by cross-referencing config files against the actual project filesystem: do referenced paths exist? Are code blocks present? Is there config drift since the last commit? No LLM call needed for scoring, making it fast and reproducible.

Target audience: Teams using multiple AI tools simultaneously (Claude Code + Cursor + Codex + OpenCode + GitHub Copilot) who need synchronized context files across all agents. Multi-provider LLM support: Claude Code seat, Cursor seat, Anthropic API, OpenAI API, Vertex AI, custom endpoint.

Differs from seeds: Most similar to agent-os (Archetype 4 — generates and manages markdown context files) but adds a complete automation layer: deterministic quality scoring, LLM-driven generation, pre-commit hooks for continuous sync, session-learning hooks, and multi-platform output (5 AI tools instead of 1). Unlike agent-os which ships a fixed set of opinionated standards, Caliber generates project-specific content using LLM analysis of the actual codebase.

01

Overview

Caliber — Overview

Origin

Published by rely-ai-org on GitHub. 1087 stars, 111 forks, 17 contributors, MIT license. Last commit: 2026-05-20. Language: TypeScript. Package: @rely-ai/caliber on npm. Active commercial product (PostHog analytics, with opt-out).

Philosophy

"Hand-written CLAUDE.md files go stale the moment you refactor. Your AI agent hallucinates paths that no longer exist, misses new dependencies, and gives advice based on yesterday's architecture."

"Scoring is deterministic — no LLM, no API calls. It cross-references your config files against your actual project filesystem: do referenced paths exist? Are code blocks present? Is there config drift since your last commit?"

"Your code stays on your machine. Bootstrap is 100% local — no LLM calls, no code sent anywhere. Generation uses your own AI subscription or API key. Caliber never sees your code."

Key Design Principles

  1. Audits first, writes second — never overwrites existing configs without showing a diff and asking. Backup to .caliber/backups/ before every write. caliber undo restores everything.

  2. Deterministic scoring, LLM generation — the caliber score command is 100% offline static analysis. The caliber init generation step uses LLM but only for generating the config content, not for scoring.

  3. Continuous sync — pre-commit hooks run caliber refresh on every commit. New team members are nudged to bootstrap on first session.

  4. Two-tier model — lightweight tasks (classification, scoring) use a faster model; heavy tasks (generation, refinement) use the default. Cost optimization is automatic.

  5. Multi-platform parity — the "Files & Setup" scoring category explicitly checks cross-platform consistency (same content served to all AI tools).

Scoring Scale

Before /setup-caliber → After /setup-caliber
Agent Config Score 35/100 (Grade D) → 94/100 (Grade A)

Score breakdown:

  • Files & Setup: 25 pts (config files exist, skills present, MCP servers, cross-platform parity)
  • Quality: 25 pts (code blocks, token budget, concrete instructions, structured headings)
  • Grounding: 20 pts (config references actual project directories/files)
  • Accuracy: 15 pts (referenced paths exist on disk, config freshness vs git history)
  • Freshness & Safety: 10 pts (recently updated, no secrets, permissions configured)
  • Bonus: 5 pts (auto-refresh hooks, AGENTS.md, OpenSkills format)
02

Architecture

Caliber — Architecture

Distribution

  • Type: npm package + Claude Code skill pack
  • Package: @rely-ai/caliber
  • License: MIT
  • Language: TypeScript

Install Methods

npx @rely-ai/caliber bootstrap  # Bootstrap (installs /setup-caliber skill)
npm install -g @rely-ai/caliber  # Global install

# Then in Claude Code / Cursor CLI:
/setup-caliber

# Or wizard mode (no AI tool):
caliber init

Directory Tree (of deployed files)

~/.caliber/config.json      # API keys + provider config (mode 0600)

project/
├── CLAUDE.md               # Claude Code context
├── CALIBER_LEARNINGS.md    # Session-learned patterns
├── AGENTS.md               # Codex / OpenCode context
├── .github/copilot-instructions.md   # GitHub Copilot context
├── .caliber/
│   └── backups/            # Pre-write backups
├── .claude/
│   ├── settings.json       # Hooks + permissions
│   └── skills/             # OpenSkills format skills per agent
│       └── */SKILL.md
├── .cursor/
│   ├── rules/*.mdc         # Cursor rule files
│   ├── mcp.json
│   └── skills/*/SKILL.md
├── .agents/
│   └── skills/*/SKILL.md   # Codex skills
├── .opencode/
│   └── skills/*/SKILL.md   # OpenCode skills
└── .mcp.json               # Auto-discovered MCP servers

Hooks Used (in .claude/settings.json)

"SessionEnd": ["caliber refresh --quiet", "caliber learn finalize --auto"]
"PostToolUse": ["caliber learn observe"]
"PostToolUseFailure": ["caliber learn observe --failure"]
"UserPromptSubmit": ["caliber learn observe --prompt"]
"Stop": [".claude/hooks/caliber-check-sync.sh"]
"SessionStart": [".claude/hooks/caliber-session-freshness.sh"]

Required Runtime

  • Node.js >= 20
  • npm (or npx)

Target AI Tools

  • Claude Code
  • Cursor
  • OpenAI Codex
  • OpenCode
  • GitHub Copilot

LLM Providers

Provider Setup
Claude Code seat caliber config → Claude Code
Cursor seat caliber config → Cursor
Anthropic API ANTHROPIC_API_KEY env var
OpenAI API OPENAI_API_KEY env var
Vertex AI VERTEX_PROJECT_ID env var
Custom endpoint OPENAI_API_KEY + OPENAI_BASE_URL
03

Components

Caliber — Components

CLI Commands

Command Purpose
caliber bootstrap One-time setup: installs /setup-caliber skill, detects agents
caliber init [--agent <list>] [--auto-approve] Generate configs for specified agents
caliber score [--compare <branch>] Deterministic 100-point quality scoring
caliber refresh [--quiet] Re-generate configs based on current codebase (used in pre-commit hook)
caliber undo Restore all configs from .caliber/backups/
caliber learn add "<text>" [--personal] Add a manual learning to CALIBER_LEARNINGS.md
caliber learn observe [--failure] [--prompt] Record tool usage for session learning (called by hooks)
caliber learn finalize --auto Distill session observations into CALIBER_LEARNINGS.md
caliber config Configure LLM provider and model
caliber hooks --install Install pre-commit hooks

Skills (8 — for Claude Code)

Skill Purpose
setup-caliber Dynamic onboarding: detect agents, check configs, generate if missing, install hooks
save-learning Save user instructions as persistent learnings (triggers on "remember this", "always do X")
find-skills Discover and recommend skills from OpenSkills catalog
caliber-testing Testing guidance for Caliber development
scoring-checks Explain and run scoring checks
llm-provider Configure LLM provider
writers-pattern Convention for writing config content
adding-a-command Guide for adding CLI commands

Hooks (6 events)

Event Purpose
SessionEnd caliber refresh --quiet (auto-sync) + caliber learn finalize --auto (distill learnings)
PostToolUse caliber learn observe (record tool usage pattern)
PostToolUseFailure caliber learn observe --failure (record failures)
UserPromptSubmit caliber learn observe --prompt (detect user corrections)
Stop caliber-check-sync.sh (offer setup if not configured)
SessionStart caliber-session-freshness.sh (check config freshness)

Cursor Rules (5 .mdc files)

  • caliber-conventions.mdc
  • caliber-learnings.mdc
  • caliber-pre-commit.mdc
  • llm-layer.mdc
  • testing.mdc

Scoring Rubric (deterministic, no LLM)

Category Points What it checks
Files & Setup 25 Config files exist, skills present, MCP servers, cross-platform parity
Quality 25 Code blocks, token budget, concrete instructions, structured headings
Grounding 20 Config references actual project directories and files
Accuracy 15 Referenced paths exist on disk, config freshness vs git history
Freshness & Safety 10 Recently updated, no leaked secrets, permissions configured
Bonus 5 Auto-refresh hooks, AGENTS.md, OpenSkills format
05

Prompts

Caliber — Prompt Excerpts

Excerpt 1: setup-caliber SKILL.md — Diagnostic-First Onboarding

---
name: setup-caliber
description: Sets up Caliber for automatic AI agent context sync. Installs pre-commit
  hooks so CLAUDE.md, Cursor rules, and Copilot instructions update automatically on
  every commit. Use when Caliber hooks are not yet installed or when the user asks
  about keeping agent configs in sync.
---

# Setup Caliber

Dynamic onboarding for Caliber — automatic AI agent context sync.
Run all diagnostic steps below on every invocation to determine what's already
set up and what still needs to be done.

### Step 1: Check if Caliber is installed

```bash
command -v caliber >/dev/null 2>&1 && caliber --version || echo "NOT_INSTALLED"

Step 3: Detect agents and check if configs exist

AGENTS=""
[ -d .claude ] && AGENTS="claude"
[ -d .cursor ] && AGENTS="${AGENTS:+$AGENTS,}cursor"
[ -d .agents ] || [ -f AGENTS.md ] && AGENTS="${AGENTS:+$AGENTS,}codex"
[ -f .github/copilot-instructions.md ] && AGENTS="${AGENTS:+$AGENTS,}github-copilot"
echo "DETECTED_AGENTS=${AGENTS:-none}"

**Prompting technique:** State-machine onboarding. Each step checks current state first and only acts if something is missing. "Run all diagnostic steps below on every invocation" prevents the agent from skipping steps based on cached assumptions. The bash-heavy diagnostic loop produces machine-readable output (`NOT_INSTALLED`, `HOOK_ACTIVE`, `DETECTED_AGENTS=claude,cursor`) that guides subsequent branching.

---

## Excerpt 2: `save-learning` SKILL.md — Persistent Learning Capture

```markdown
---
name: save-learning
description: Saves user instructions as persistent learnings for future sessions. Use
  when the user says 'remember this', 'always do X', 'from now on', 'never do Y', or
  gives any instruction they want persisted across sessions.
---

## Instructions

2. Refine the instruction into a clean, actionable learning bullet with an appropriate
   type prefix:
   - `**[convention]**` — coding style, workflow, git conventions
   - `**[pattern]**` — reusable code patterns
   - `**[anti-pattern]**` — things to avoid
   - `**[preference]**` — personal/team preferences
   - `**[context]**` — project-specific context

3. Show the refined learning to the user and ask for confirmation

4. If confirmed, run:
   ```bash
   caliber learn add "<refined learning>"

Examples

User: "when developing features, push to next branch not master, remember it" -> Refine: **[convention]** Push feature commits to the \next` branch, not `master`-> "I'll save this as a project learning: ... Save for future sessions?" -> If yes: runcaliber learn add "..."`


**Prompting technique:** Type-prefixed learning capture with mandatory user confirmation and worked examples. The five type prefixes (`[convention]`, `[pattern]`, `[anti-pattern]`, `[preference]`, `[context]`) create a taxonomy that makes learnings searchable and categorizable. The "When NOT to trigger" section prevents false positives.
09

Uniqueness

Caliber — Uniqueness & Positioning

Differs From Seeds

Most similar to agent-os (Archetype 4 — generates and manages project markdown context files) but with a complete automation layer that agent-os lacks: deterministic quality scoring (100-point rubric, no LLM), continuous sync via pre-commit hooks, session-learning hooks that distill interaction patterns, multi-platform output (5 AI tools), and a two-tier LLM model system. Agent-os ships a fixed opinionated standards/ directory with Anthropic's conventions; Caliber generates project-specific content tailored to the codebase's actual architecture.

Unlike ai-context-linter (same batch), which validates context files after-the-fact in CI, Caliber generates and maintains context files proactively. They solve complementary problems: Caliber generates, ai-context-linter validates. Together they form a complete create-maintain-validate loop (though they are from different authors).

The AGENTS.md vs AGENT.md Schism — Caliber Position

Caliber explicitly generates AGENTS.md as the Codex/OpenCode standard. The README names it a "bonus" in the scoring rubric. Caliber does not generate AGENT.md (the competing format proposed by Geoffrey Huntley / Sourcegraph Amp). This implicitly endorses the AGENTS.md naming convention from OpenAI.

Observable Failure Modes

  • Pre-commit hook runs LLM generation on every commit — slow commits if LLM latency is high
  • Session learning hooks run caliber learn observe on every tool use — could slow down tool calls if Caliber CLI has high startup cost
  • The scoring rubric is deterministic but fixed — it may not reflect what matters for a specific team's workflow
  • Context files generated by LLM may have "accurate but unhelpful" content if the codebase is large and the agent over-generalizes
  • 5-platform multi-target means 5 sets of files to maintain; score drift across platforms is possible

Multi-Tool Synchronization

The key unique value: one caliber refresh generates accurate, synchronized context for all 5 AI tools simultaneously. No other framework in the corpus explicitly solves the "team uses Cursor and Claude Code and Codex simultaneously" problem.

04

Workflow

Caliber — Workflow

Initial Setup Flow

Phase Action Artifact
Bootstrap npx @rely-ai/caliber bootstrap /setup-caliber skill installed
Setup /setup-caliber in AI tool Auto-detect stack, generate configs, install hooks
Scoring caliber score 0-100 quality score per category
Review Show diff of proposed changes User accepts/rejects/refines
Write Accept changes Config files written + backed up

Continuous Sync Loop

  npx @rely-ai/caliber bootstrap       ← one-time, 2 seconds
              ↓
  agent runs /setup-caliber             ← agent handles everything
              ↓
  ┌──── configs generated ◄────────────┐
  │           ↓                        │
  │     code evolves                   │
  │           ↓                        │
  └──► caliber refresh (on commit) ───►┘

Session Learning Loop

Hook Action Artifact
PostToolUse caliber learn observe Raw observation recorded
PostToolUseFailure caliber learn observe --failure Failure pattern recorded
UserPromptSubmit caliber learn observe --prompt User correction detected
SessionEnd caliber learn finalize --auto Patterns distilled to CALIBER_LEARNINGS.md

Phase-to-Artifact Map

Phase Artifact
Score Score report (stdout)
Generate CLAUDE.md, .cursor/rules/*.mdc, AGENTS.md, copilot-instructions.md
Backup .caliber/backups/
Learn CALIBER_LEARNINGS.md
Hooks install .git/hooks/pre-commit

Approval Gates

Gate Type
Config diff review file-review (show diff, accept/refine/decline)
AGENTS.md re-export yes-no (always asks, never auto-creates)
Per-change chat refinement freetext-clarify

Score Threshold

If existing config scores 95+, Caliber skips full regeneration and applies targeted fixes only.

06

Memory Context

Caliber — Memory & Context

State Storage

Multi-layer:

Layer Location Persistence
Config files CLAUDE.md, .cursor/rules/, AGENTS.md, copilot-instructions.md Project
Session learnings CALIBER_LEARNINGS.md Project
Personal learnings ~/.caliber/learnings.md (with --personal) Global
Config backups .caliber/backups/ Project
Provider config ~/.caliber/config.json Global (mode 0600)

Session Learning Mechanism

The caliber learn subsystem watches agent sessions:

  1. observe (PostToolUse): records each tool call's context
  2. observe --failure (PostToolUseFailure): records failure patterns
  3. observe --prompt (UserPromptSubmit): detects when user corrects agent behavior
  4. finalize --auto (SessionEnd): LLM distills observations into categorized learnings in CALIBER_LEARNINGS.md

Context Freshness

caliber-session-freshness.sh runs on SessionStart to check if CLAUDE.md is stale vs. recent git commits. If stale, it notifies (via notify-app.sh) that context may need refreshing.

caliber-check-sync.sh runs on Stop — if Caliber is not configured, it offers setup for the next session.

Pre-Commit Refresh

The pre-commit hook runs caliber refresh --quiet on every commit — ensuring that whenever code changes are committed, the context files are updated to reflect the new state.

Backups

Before any write, Caliber backs up existing configs to .caliber/backups/. caliber undo restores the previous state. This prevents accidental loss of hand-crafted context sections.

07

Orchestration

Caliber — Orchestration

Multi-Agent

No native multi-agent workflow. Caliber is a single-agent tool that generates configurations for multiple AI tools but does not orchestrate multiple agents running simultaneously.

Execution Modes

Two phases:

  1. One-shot setup (caliber init / /setup-caliber): Interactive generation workflow
  2. Event-driven continuous (hooks): caliber refresh on every commit, caliber learn on every tool use

Multi-Model

Yes — Caliber routes different model classes to different tasks via the two-tier model system:

  • Fast model: lightweight classification and scoring tasks
  • Default model: generation and refinement
  • Model is user-configurable via caliber config or CALIBER_MODEL env var

Isolation

None — runs in-place on the project's actual files. Backups to .caliber/backups/ provide rollback but no isolation.

Consensus

None.

Prompt Chaining

Limited — the setup-caliber skill runs sequential diagnostic steps, with each step's output determining the next action. This is control flow, not structural prompt chaining.

08

Ui Cli Surface

Caliber — UI / CLI Surface

CLI Binary

Yes — caliber (npm global install) or npx @rely-ai/caliber.

  • Name: caliber
  • Type: Own runtime (TypeScript compiled to Node.js)
  • Thin wrapper: No — full TypeScript CLI with its own scoring engine, LLM integration, and hook management
  • Key subcommands:
    • caliber bootstrap — one-time setup
    • caliber init — generate configs
    • caliber score [--compare <branch>] — deterministic quality scoring
    • caliber refresh — re-generate based on current codebase
    • caliber undo — restore from backups
    • caliber learn add/observe/finalize — session learning management
    • caliber config — LLM provider configuration
    • caliber hooks --install — install pre-commit hooks

Local UI

None. CLI-only interface.

IDE Integration

None beyond standard file placement. Skills installed to .claude/skills/, .cursor/skills/, .agents/skills/.

Observability

  • caliber score provides structured scoring report (stdout)
  • caliber score --compare main shows score delta between branches
  • Score badge for README: ![Caliber Score](https://img.shields.io/badge/caliber-94%2F100-brightgreen)
  • Anonymous usage telemetry via PostHog (command names, durations — opt-out with CALIBER_TELEMETRY_DISABLED=1)

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.