Caliber

caliber · rely-ai-org/caliber · ★ 1.1k · last commit 2026-05-20

Primitive shape 14 total

Skills 8 Hooks 6

Summary

Caliber — Summary

Caliber is a CLI tool and agent skill pack that continuously generates and maintains AI context files (CLAUDE.md, .cursor/rules/*.mdc, AGENTS.md, copilot-instructions.md) for any codebase, keeping them accurate as the code evolves. It provides a deterministic 100-point scoring system (caliber score) that evaluates context file quality without any LLM calls, an LLM-driven generation workflow (caliber init / /setup-caliber skill), pre-commit hooks for automatic refresh on every commit, and session-learning hooks that distill agent interaction patterns into CALIBER_LEARNINGS.md.

Problem it solves: Hand-written context files go stale the moment code is refactored — the AI agent hallucinates paths that no longer exist, misses new dependencies, and gives advice based on yesterday's architecture. Caliber automates the full create-score-refresh loop so context files stay accurate without manual maintenance.

Distinctive trait: The deterministic scoring system — 100 points across 6 categories (Files & Setup 25, Quality 25, Grounding 20, Accuracy 15, Freshness 10, Bonus 5) — evaluates context by cross-referencing config files against the actual project filesystem: do referenced paths exist? Are code blocks present? Is there config drift since the last commit? No LLM call needed for scoring, making it fast and reproducible.

Target audience: Teams using multiple AI tools simultaneously (Claude Code + Cursor + Codex + OpenCode + GitHub Copilot) who need synchronized context files across all agents. Multi-provider LLM support: Claude Code seat, Cursor seat, Anthropic API, OpenAI API, Vertex AI, custom endpoint.

Differs from seeds: Most similar to agent-os (Archetype 4 — generates and manages markdown context files) but adds a complete automation layer: deterministic quality scoring, LLM-driven generation, pre-commit hooks for continuous sync, session-learning hooks, and multi-platform output (5 AI tools instead of 1). Unlike agent-os which ships a fixed set of opinionated standards, Caliber generates project-specific content using LLM analysis of the actual codebase.

Overview

Caliber — Overview

Origin

Published by rely-ai-org on GitHub. 1087 stars, 111 forks, 17 contributors, MIT license. Last commit: 2026-05-20. Language: TypeScript. Package: @rely-ai/caliber on npm. Active commercial product (PostHog analytics, with opt-out).

Philosophy

"Hand-written CLAUDE.md files go stale the moment you refactor. Your AI agent hallucinates paths that no longer exist, misses new dependencies, and gives advice based on yesterday's architecture."

"Scoring is deterministic — no LLM, no API calls. It cross-references your config files against your actual project filesystem: do referenced paths exist? Are code blocks present? Is there config drift since your last commit?"

"Your code stays on your machine. Bootstrap is 100% local — no LLM calls, no code sent anywhere. Generation uses your own AI subscription or API key. Caliber never sees your code."

Key Design Principles

Audits first, writes second — never overwrites existing configs without showing a diff and asking. Backup to .caliber/backups/ before every write. caliber undo restores everything.
Deterministic scoring, LLM generation — the caliber score command is 100% offline static analysis. The caliber init generation step uses LLM but only for generating the config content, not for scoring.
Continuous sync — pre-commit hooks run caliber refresh on every commit. New team members are nudged to bootstrap on first session.
Two-tier model — lightweight tasks (classification, scoring) use a faster model; heavy tasks (generation, refinement) use the default. Cost optimization is automatic.
Multi-platform parity — the "Files & Setup" scoring category explicitly checks cross-platform consistency (same content served to all AI tools).

Scoring Scale

Before /setup-caliber → After /setup-caliber
Agent Config Score 35/100 (Grade D) → 94/100 (Grade A)

Score breakdown:

Files & Setup: 25 pts (config files exist, skills present, MCP servers, cross-platform parity)
Quality: 25 pts (code blocks, token budget, concrete instructions, structured headings)
Grounding: 20 pts (config references actual project directories/files)
Accuracy: 15 pts (referenced paths exist on disk, config freshness vs git history)
Freshness & Safety: 10 pts (recently updated, no secrets, permissions configured)
Bonus: 5 pts (auto-refresh hooks, AGENTS.md, OpenSkills format)

Architecture

Caliber — Architecture

Distribution

Type: npm package + Claude Code skill pack
Package: @rely-ai/caliber
License: MIT
Language: TypeScript

Install Methods

npx @rely-ai/caliber bootstrap  # Bootstrap (installs /setup-caliber skill)
npm install -g @rely-ai/caliber  # Global install

# Then in Claude Code / Cursor CLI:
/setup-caliber

# Or wizard mode (no AI tool):
caliber init

Directory Tree (of deployed files)

~/.caliber/config.json      # API keys + provider config (mode 0600)

project/
├── CLAUDE.md               # Claude Code context
├── CALIBER_LEARNINGS.md    # Session-learned patterns
├── AGENTS.md               # Codex / OpenCode context
├── .github/copilot-instructions.md   # GitHub Copilot context
├── .caliber/
│   └── backups/            # Pre-write backups
├── .claude/
│   ├── settings.json       # Hooks + permissions
│   └── skills/             # OpenSkills format skills per agent
│       └── */SKILL.md
├── .cursor/
│   ├── rules/*.mdc         # Cursor rule files
│   ├── mcp.json
│   └── skills/*/SKILL.md
├── .agents/
│   └── skills/*/SKILL.md   # Codex skills
├── .opencode/
│   └── skills/*/SKILL.md   # OpenCode skills
└── .mcp.json               # Auto-discovered MCP servers

Hooks Used (in `.claude/settings.json`)

"SessionEnd": ["caliber refresh --quiet", "caliber learn finalize --auto"]
"PostToolUse": ["caliber learn observe"]
"PostToolUseFailure": ["caliber learn observe --failure"]
"UserPromptSubmit": ["caliber learn observe --prompt"]
"Stop": [".claude/hooks/caliber-check-sync.sh"]
"SessionStart": [".claude/hooks/caliber-session-freshness.sh"]

Required Runtime

Node.js >= 20
npm (or npx)

Target AI Tools

Claude Code
Cursor
OpenAI Codex
OpenCode
GitHub Copilot

LLM Providers

Provider	Setup
Claude Code seat	`caliber config` → Claude Code
Cursor seat	`caliber config` → Cursor
Anthropic API	`ANTHROPIC_API_KEY` env var
OpenAI API	`OPENAI_API_KEY` env var
Vertex AI	`VERTEX_PROJECT_ID` env var
Custom endpoint	`OPENAI_API_KEY` + `OPENAI_BASE_URL`

Components

Caliber — Components

CLI Commands

Command	Purpose
`caliber bootstrap`	One-time setup: installs /setup-caliber skill, detects agents
`caliber init [--agent <list>] [--auto-approve]`	Generate configs for specified agents
`caliber score [--compare <branch>]`	Deterministic 100-point quality scoring
`caliber refresh [--quiet]`	Re-generate configs based on current codebase (used in pre-commit hook)
`caliber undo`	Restore all configs from `.caliber/backups/`
`caliber learn add "<text>" [--personal]`	Add a manual learning to CALIBER_LEARNINGS.md
`caliber learn observe [--failure] [--prompt]`	Record tool usage for session learning (called by hooks)
`caliber learn finalize --auto`	Distill session observations into CALIBER_LEARNINGS.md
`caliber config`	Configure LLM provider and model
`caliber hooks --install`	Install pre-commit hooks

Skills (8 — for Claude Code)

Skill	Purpose
`setup-caliber`	Dynamic onboarding: detect agents, check configs, generate if missing, install hooks
`save-learning`	Save user instructions as persistent learnings (triggers on "remember this", "always do X")
`find-skills`	Discover and recommend skills from OpenSkills catalog
`caliber-testing`	Testing guidance for Caliber development
`scoring-checks`	Explain and run scoring checks
`llm-provider`	Configure LLM provider
`writers-pattern`	Convention for writing config content
`adding-a-command`	Guide for adding CLI commands

Hooks (6 events)

Event	Purpose
`SessionEnd`	`caliber refresh --quiet` (auto-sync) + `caliber learn finalize --auto` (distill learnings)
`PostToolUse`	`caliber learn observe` (record tool usage pattern)
`PostToolUseFailure`	`caliber learn observe --failure` (record failures)
`UserPromptSubmit`	`caliber learn observe --prompt` (detect user corrections)
`Stop`	`caliber-check-sync.sh` (offer setup if not configured)
`SessionStart`	`caliber-session-freshness.sh` (check config freshness)

Cursor Rules (5 `.mdc` files)

caliber-conventions.mdc
caliber-learnings.mdc
caliber-pre-commit.mdc
llm-layer.mdc
testing.mdc

Scoring Rubric (deterministic, no LLM)

Category	Points	What it checks
Files & Setup	25	Config files exist, skills present, MCP servers, cross-platform parity
Quality	25	Code blocks, token budget, concrete instructions, structured headings
Grounding	20	Config references actual project directories and files
Accuracy	15	Referenced paths exist on disk, config freshness vs git history
Freshness & Safety	10	Recently updated, no leaked secrets, permissions configured
Bonus	5	Auto-refresh hooks, AGENTS.md, OpenSkills format

Prompts

Caliber — Prompt Excerpts

Excerpt 1: `setup-caliber` SKILL.md — Diagnostic-First Onboarding

---
name: setup-caliber
description: Sets up Caliber for automatic AI agent context sync. Installs pre-commit
  hooks so CLAUDE.md, Cursor rules, and Copilot instructions update automatically on
  every commit. Use when Caliber hooks are not yet installed or when the user asks
  about keeping agent configs in sync.
---

# Setup Caliber

Dynamic onboarding for Caliber — automatic AI agent context sync.
Run all diagnostic steps below on every invocation to determine what's already
set up and what still needs to be done.

### Step 1: Check if Caliber is installed

```bash
command -v caliber >/dev/null 2>&1 && caliber --version || echo "NOT_INSTALLED"

Step 3: Detect agents and check if configs exist

AGENTS=""
[ -d .claude ] && AGENTS="claude"
[ -d .cursor ] && AGENTS="${AGENTS:+$AGENTS,}cursor"
[ -d .agents ] || [ -f AGENTS.md ] && AGENTS="${AGENTS:+$AGENTS,}codex"
[ -f .github/copilot-instructions.md ] && AGENTS="${AGENTS:+$AGENTS,}github-copilot"
echo "DETECTED_AGENTS=${AGENTS:-none}"


**Prompting technique:** State-machine onboarding. Each step checks current state first and only acts if something is missing. "Run all diagnostic steps below on every invocation" prevents the agent from skipping steps based on cached assumptions. The bash-heavy diagnostic loop produces machine-readable output (`NOT_INSTALLED`, `HOOK_ACTIVE`, `DETECTED_AGENTS=claude,cursor`) that guides subsequent branching.

---

## Excerpt 2: `save-learning` SKILL.md — Persistent Learning Capture

```markdown
---
name: save-learning
description: Saves user instructions as persistent learnings for future sessions. Use
  when the user says 'remember this', 'always do X', 'from now on', 'never do Y', or
  gives any instruction they want persisted across sessions.
---

## Instructions

2. Refine the instruction into a clean, actionable learning bullet with an appropriate
   type prefix:
   - `**[convention]**` — coding style, workflow, git conventions
   - `**[pattern]**` — reusable code patterns
   - `**[anti-pattern]**` — things to avoid
   - `**[preference]**` — personal/team preferences
   - `**[context]**` — project-specific context

3. Show the refined learning to the user and ask for confirmation

4. If confirmed, run:
   ```bash
   caliber learn add "<refined learning>"

Examples

User: "when developing features, push to next branch not master, remember it" -> Refine: **[convention]** Push feature commits to the \next` branch, not `master`-> "I'll save this as a project learning: ... Save for future sessions?" -> If yes: runcaliber learn add "..."`


**Prompting technique:** Type-prefixed learning capture with mandatory user confirmation and worked examples. The five type prefixes (`[convention]`, `[pattern]`, `[anti-pattern]`, `[preference]`, `[context]`) create a taxonomy that makes learnings searchable and categorizable. The "When NOT to trigger" section prevents false positives.

Uniqueness

Caliber — Uniqueness & Positioning

Differs From Seeds

Most similar to agent-os (Archetype 4 — generates and manages project markdown context files) but with a complete automation layer that agent-os lacks: deterministic quality scoring (100-point rubric, no LLM), continuous sync via pre-commit hooks, session-learning hooks that distill interaction patterns, multi-platform output (5 AI tools), and a two-tier LLM model system. Agent-os ships a fixed opinionated standards/ directory with Anthropic's conventions; Caliber generates project-specific content tailored to the codebase's actual architecture.

Unlike ai-context-linter (same batch), which validates context files after-the-fact in CI, Caliber generates and maintains context files proactively. They solve complementary problems: Caliber generates, ai-context-linter validates. Together they form a complete create-maintain-validate loop (though they are from different authors).

The AGENTS.md vs AGENT.md Schism — Caliber Position

Caliber explicitly generates AGENTS.md as the Codex/OpenCode standard. The README names it a "bonus" in the scoring rubric. Caliber does not generate AGENT.md (the competing format proposed by Geoffrey Huntley / Sourcegraph Amp). This implicitly endorses the AGENTS.md naming convention from OpenAI.

Observable Failure Modes

Pre-commit hook runs LLM generation on every commit — slow commits if LLM latency is high
Session learning hooks run caliber learn observe on every tool use — could slow down tool calls if Caliber CLI has high startup cost
The scoring rubric is deterministic but fixed — it may not reflect what matters for a specific team's workflow
Context files generated by LLM may have "accurate but unhelpful" content if the codebase is large and the agent over-generalizes
5-platform multi-target means 5 sets of files to maintain; score drift across platforms is possible

Multi-Tool Synchronization

The key unique value: one caliber refresh generates accurate, synchronized context for all 5 AI tools simultaneously. No other framework in the corpus explicitly solves the "team uses Cursor and Claude Code and Codex simultaneously" problem.

Workflow

Caliber — Workflow

Initial Setup Flow

Phase	Action	Artifact
Bootstrap	`npx @rely-ai/caliber bootstrap`	`/setup-caliber` skill installed
Setup	`/setup-caliber` in AI tool	Auto-detect stack, generate configs, install hooks
Scoring	`caliber score`	0-100 quality score per category
Review	Show diff of proposed changes	User accepts/rejects/refines
Write	Accept changes	Config files written + backed up

Continuous Sync Loop

  npx @rely-ai/caliber bootstrap       ← one-time, 2 seconds
              ↓
  agent runs /setup-caliber             ← agent handles everything
              ↓
  ┌──── configs generated ◄────────────┐
  │           ↓                        │
  │     code evolves                   │
  │           ↓                        │
  └──► caliber refresh (on commit) ───►┘

Session Learning Loop

Hook	Action	Artifact
PostToolUse	`caliber learn observe`	Raw observation recorded
PostToolUseFailure	`caliber learn observe --failure`	Failure pattern recorded
UserPromptSubmit	`caliber learn observe --prompt`	User correction detected
SessionEnd	`caliber learn finalize --auto`	Patterns distilled to `CALIBER_LEARNINGS.md`

Phase-to-Artifact Map

Phase	Artifact
Score	Score report (stdout)
Generate	CLAUDE.md, .cursor/rules/*.mdc, AGENTS.md, copilot-instructions.md
Backup	`.caliber/backups/`
Learn	`CALIBER_LEARNINGS.md`
Hooks install	`.git/hooks/pre-commit`

Approval Gates

Gate	Type
Config diff review	file-review (show diff, accept/refine/decline)
AGENTS.md re-export	yes-no (always asks, never auto-creates)
Per-change chat refinement	freetext-clarify

Score Threshold

If existing config scores 95+, Caliber skips full regeneration and applies targeted fixes only.

Memory Context

Caliber — Memory & Context

State Storage

Multi-layer:

Layer	Location	Persistence
Config files	CLAUDE.md, .cursor/rules/, AGENTS.md, copilot-instructions.md	Project
Session learnings	CALIBER_LEARNINGS.md	Project
Personal learnings	`~/.caliber/learnings.md` (with `--personal`)	Global
Config backups	`.caliber/backups/`	Project
Provider config	`~/.caliber/config.json`	Global (mode 0600)

Session Learning Mechanism

The caliber learn subsystem watches agent sessions:

observe (PostToolUse): records each tool call's context
observe --failure (PostToolUseFailure): records failure patterns
observe --prompt (UserPromptSubmit): detects when user corrects agent behavior
finalize --auto (SessionEnd): LLM distills observations into categorized learnings in CALIBER_LEARNINGS.md

Context Freshness

caliber-session-freshness.sh runs on SessionStart to check if CLAUDE.md is stale vs. recent git commits. If stale, it notifies (via notify-app.sh) that context may need refreshing.

caliber-check-sync.sh runs on Stop — if Caliber is not configured, it offers setup for the next session.

Pre-Commit Refresh

The pre-commit hook runs caliber refresh --quiet on every commit — ensuring that whenever code changes are committed, the context files are updated to reflect the new state.

Backups

Before any write, Caliber backs up existing configs to .caliber/backups/. caliber undo restores the previous state. This prevents accidental loss of hand-crafted context sections.

Orchestration

Caliber — Orchestration

Multi-Agent

No native multi-agent workflow. Caliber is a single-agent tool that generates configurations for multiple AI tools but does not orchestrate multiple agents running simultaneously.

Execution Modes

Two phases:

One-shot setup (caliber init / /setup-caliber): Interactive generation workflow
Event-driven continuous (hooks): caliber refresh on every commit, caliber learn on every tool use

Multi-Model

Yes — Caliber routes different model classes to different tasks via the two-tier model system:

Fast model: lightweight classification and scoring tasks
Default model: generation and refinement
Model is user-configurable via caliber config or CALIBER_MODEL env var

Isolation

None — runs in-place on the project's actual files. Backups to .caliber/backups/ provide rollback but no isolation.

Consensus

None.

Prompt Chaining

Limited — the setup-caliber skill runs sequential diagnostic steps, with each step's output determining the next action. This is control flow, not structural prompt chaining.

Ui Cli Surface

Caliber — UI / CLI Surface

CLI Binary

Yes — caliber (npm global install) or npx @rely-ai/caliber.

Name: caliber
Type: Own runtime (TypeScript compiled to Node.js)
Thin wrapper: No — full TypeScript CLI with its own scoring engine, LLM integration, and hook management
Key subcommands:
- caliber bootstrap — one-time setup
- caliber init — generate configs
- caliber score [--compare <branch>] — deterministic quality scoring
- caliber refresh — re-generate based on current codebase
- caliber undo — restore from backups
- caliber learn add/observe/finalize — session learning management
- caliber config — LLM provider configuration
- caliber hooks --install — install pre-commit hooks

Local UI

None. CLI-only interface.

IDE Integration

None beyond standard file placement. Skills installed to .claude/skills/, .cursor/skills/, .agents/skills/.

Observability

caliber score provides structured scoring report (stdout)
caliber score --compare main shows score delta between branches
Score badge for README: ![Caliber Score](https://img.shields.io/badge/caliber-94%2F100-brightgreen)
Anonymous usage telemetry via PostHog (command names, durations — opt-out with CALIBER_TELEMETRY_DISABLED=1)

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

A8 Cross-runtime harness

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A8 Cross-runtime harness

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

A8 Cross-runtime harness

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

A8 Cross-runtime harness

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

A8 Cross-runtime harness

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

A8 Cross-runtime harness

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.

Distribution

Type: npm-package
License: MIT
Install: npm-install
Version: unknown (npm @rely-ai/caliber; last commit 2026-05-20)

Surfaces

CLI binary: caliber
CLI subcmds: 10
Local UI: No

Components

Commands: 0
Skills: 8
Subagents: 0
Hooks: 6
MCP servers: 0
MCP tools: 0
Scripts: 4
Templates: 0

Workflow

Phases: 6
Approval gates: 3
Spec format: markdown
Spec storage: flat-files
Delta or full: whole-file

Orchestration

Multi-agent: No
Pattern: sequential
Max concurrent: 1
Isolation: none
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text

Execution

Mode: event-driven
Crash recovery: Yes
Compaction: No
Session handoff: Yes
Streaming: No

Memory

Type: file-based
Persistence: project
Search: none
State files: 5 files

Quality

TDD: No
TDD mechanism: none
Validators: 1
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: Yes
Audit format: structured-md
Replay: No

Tools

Primary: claude-code
Targets: 5
Portability: high

Signals

Stars: 1.1k
Last commit: 2026-05-20
Contributors: 17
Maintainer: active
Quality score: 4.4/10

Summary

Caliber — Summary

Overview

Caliber — Overview

Origin

Philosophy

Key Design Principles

Scoring Scale

Architecture

Caliber — Architecture

Distribution

Install Methods

Directory Tree (of deployed files)

Hooks Used (in .claude/settings.json)

Required Runtime

Target AI Tools

LLM Providers

Components

Caliber — Components

CLI Commands

Skills (8 — for Claude Code)

Hooks (6 events)

Cursor Rules (5 .mdc files)

Scoring Rubric (deterministic, no LLM)

Prompts

Caliber — Prompt Excerpts

Excerpt 1: setup-caliber SKILL.md — Diagnostic-First Onboarding

Step 3: Detect agents and check if configs exist

Examples

Uniqueness

Caliber — Uniqueness & Positioning

Differs From Seeds

The AGENTS.md vs AGENT.md Schism — Caliber Position

Observable Failure Modes

Multi-Tool Synchronization

Workflow

Caliber — Workflow

Initial Setup Flow

Continuous Sync Loop

Session Learning Loop

Phase-to-Artifact Map

Approval Gates

Score Threshold

Memory Context

Caliber — Memory & Context

State Storage

Session Learning Mechanism

Context Freshness

Pre-Commit Refresh

Backups

Orchestration

Caliber — Orchestration

Multi-Agent

Execution Modes

Multi-Model

Isolation

Consensus

Prompt Chaining

Ui Cli Surface

Caliber — UI / CLI Surface

CLI Binary

Local UI

IDE Integration

Observability

Related frameworks

Hooks Used (in `.claude/settings.json`)

Cursor Rules (5 `.mdc` files)

Excerpt 1: `setup-caliber` SKILL.md — Diagnostic-First Onboarding