Skip to content
/

agent-plugins-skills (richfrem)

richfrem-agent-plugins-skills · richfrem/agent-plugins-skills · ★ 3 · last commit 2026-05-23

Provide a self-improving, cross-platform plugin ecosystem where skills evolve through eval-gated iteration rather than human-written intuition.

Best whenAll current AI agent frameworks are Transitional Architectures — they should be designed for deliberate obsolescence as native SDKs mature.
Skip ifHard dependencies between sibling plugins, Assuming a specific framework is running
vs seeds
superpowers(SKILL.md format, skills-primary) in structure but the eval-gated self-improvement loop (Karpathy Autoresearch Loop: mut…
Primitive shape 192 total
Skills 137 Subagents 52 MCP tools 3
00

Summary

agent-plugins-skills (richfrem)

A self-improving, cross-platform universal plugin ecosystem (23 plugins, 137+ skills, 52 sub-agents) that treats AI agent frameworks as "Transitional Architectures" — deliberately ephemeral scaffolding until native SDKs mature. Built by Richard Fremmerlid, the system's flagship component is agent-agentic-os (v1.7.0): a continuous self-improvement OS with eval-gated skill evolution, the Karpathy Autoresearch Loop (mutate SKILL.md → evaluate.py → KEEP/DISCARD), longitudinal experiment tracking, and a hub-and-spoke ADR pattern. The framework supports Claude Code, GitHub Copilot, Gemini CLI, Roo Code, Windsurf, Cursor, Antigravity, and any compliant agent framework via a single .agents/ folder standard. The Super-RAG 3-tier retrieval stack combines RLM keyword (O(1)), vector semantic (O(log N)), and Obsidian wiki concept nodes.

differs_from_seeds: Most similar to superpowers in SKILL.md format and skills-primary architecture, but richfrem's repo is architecturally closer to a meta-framework: it treats all other frameworks (kiro, spec-kit, agent-os) as transitional bridges and implements an eval-gated self-improvement loop that no seed has. The "eval-gated skill evolution" pattern (Karpathy Loop: mutate → evaluate.py KEEP/DISCARD) is a novel engineering contribution not present in any seed.

01

Overview

Overview — agent-plugins-skills (richfrem)

Origin

Created by Richard Fremmerlid. 3 stars (very new/niche). Last commit: 2026-05-23. Python + bash.

Philosophy (verbatim from README)

"This repository is built on a pragmatic acceptance of the current AI engineering landscape: the ecosystem changes weekly, and workflows that were revolutionary six months ago are obsolete today.

Frameworks like agent-agentic-os and spec-kitty are treated as Transitional Architectures — bridges between what agents need to do today and what native SDKs will eventually handle. When Anthropic, Google, and GitHub harden native memory persistence, execution safety, and multi-agent orchestration, large swaths of this tooling will be happily discarded.

Skills are Applications; the SDK is the OS. Individual skills must function in complete isolation — no hard dependencies on sibling plugins, no assumptions about which framework is running."

The Improvement OS (verbatim)

"The OS implements an eval-gated improvement pipeline for autonomous skill evolution:

os-architect        ← intent classifier + ecosystem router
    ↓
os-improvement-loop ← learning engine: orchestrates multi-iteration improvement
    ↓
os-eval-runner      ← inner gate: KEEP/DISCARD per iteration (evaluate.py)
    ↓
os-eval-backport    ← human gate: review before lab winner → production
    ↓
os-experiment-log   ← scientific backbone: longitudinal tracking + synthesis
```"

Karpathy Autoresearch Loop (verbatim)

"Skills that score HIGH on the autoresearch viability rubric (objectivity + speed + frequency + utility) can run fully autonomous self-improvement loops:

mutate SKILL.md → evaluate.py → exit 0 (KEEP) or exit 1 (DISCARD) → repeat
```"

Super-RAG architecture

Three-tier retrieval: O(1) RLM keyword → O(log N) vector semantic → full concept nodes from Obsidian wiki engine.

Platforms

Linux, macOS, Windows. All plugins deploy to .agents/ folder standard.

02

Architecture

Architecture — agent-plugins-skills (richfrem)

Distribution

Multiple install methods: uvx, bootstrap.py, npx skills, Plugin Marketplace / Extension CLI.

Install

# All plugins (recommended)
uvx --from git+https://github.com/richfrem/agent-plugins-skills plugin-add richfrem/agent-plugins-skills

# Via bootstrap.py
python bootstrap.py

# Via npx skills
npx -y skills add --all richfrem/agent-plugins-skills

Required runtime

  • Python 3 (uvx, bootstrap.py, evaluate.py scripts)
  • Node.js (npx skills)
  • Bash/zsh
  • Claude Code or any compliant agent framework

Directory tree (abbreviated)

.
├── plugins/
│   ├── agent-agentic-os/       # Flagship: Improvement OS (v1.7.0)
│   │   ├── agents/             # 5 OS-level agents
│   │   ├── skills/             # 17 OS skills
│   │   ├── commands/           # (OS commands)
│   │   ├── evals/              # Evaluation scripts
│   │   ├── hooks/              # (hooks)
│   │   ├── scripts/            # evaluate.py, plot_eval_progress.py
│   │   └── plugin.yaml
│   ├── agent-loops/            # 6 execution primitives
│   │   └── skills/
│   │       ├── agent-swarm/
│   │       ├── dual-loop/
│   │       ├── learning-loop/
│   │       ├── orchestrator/
│   │       ├── red-team-review/
│   │       └── triple-loop-learning/
│   ├── agent-scaffolders/      # Scaffolding tools
│   ├── claude-cli/             # Claude CLI bridge
│   ├── coding-conventions/     # Code standards
│   ├── context-bundler/        # Context management
│   ├── copilot-cli/            # GitHub Copilot bridge
│   ├── dependency-management/  # Dep management
│   ├── exploration-cycle-plugin/
│   ├── gemini-cli/             # Gemini CLI bridge
│   ├── huggingface-utils/      # HuggingFace tools
│   ├── link-checker/           # Link validation
│   ├── memory-management/      # Memory tools
│   ├── mermaid-to-png/         # Diagram conversion (live eval example)
│   ├── obsidian-wiki-engine/   # Super-RAG tier 3
│   ├── plugin-manager/         # Plugin lifecycle
│   ├── rlm-factory/            # Super-RAG tier 1 (O(1) keyword)
│   ├── rsvp-speed-reader/      # Reading tool
│   ├── spec-kitty-plugin/      # Spec-Kitty bridge
│   ├── task-manager/           # Task tracking
│   ├── tools_manifest.json
│   ├── vector-db/              # Super-RAG tier 2 (O(log N) vector)
│   └── voice-writer/           # Voice input
├── .agents/                   # Deployment target (not committed)
├── bootstrap.py
├── INSTALL.md
├── skills-lock.json
└── symlinks.json

Target AI tools

Claude Code, GitHub Copilot, Gemini CLI, Roo Code, Windsurf, Cursor, Antigravity, and any compliant agent framework.

Config files

  • plugins/<name>/plugin.yaml — per-plugin manifest
  • skills-lock.json — skills lock file (like package-lock.json)
  • symlinks.json — hub-and-spoke symlink map
  • .mcp.json — MCP config
03

Components

Components — agent-plugins-skills (richfrem)

Plugins (23 total)

agent-agentic-os, agent-loops, agent-scaffolders, claude-cli, coding-conventions, context-bundler, copilot-cli, dependency-management, exploration-cycle-plugin, gemini-cli, huggingface-utils, link-checker, memory-management, mermaid-to-png, obsidian-wiki-engine, plugin-manager, rlm-factory, rsvp-speed-reader, spec-kitty-plugin, task-manager, vector-db, voice-writer + (1 unnamed)

Skills (137+ total — selected)

agent-agentic-os (17 skills)

optimize-agent-instructions, os-architect, os-clean-locks, os-environment-probe, os-eval-backport, os-eval-lab-setup, os-eval-runner, os-evolution-planner, os-evolution-verifier, os-experiment-log, os-guide, os-improvement-loop, os-improvement-report, os-init, os-memory-manager, self-evolution, todo-check

agent-loops (6 skills)

agent-swarm, dual-loop, learning-loop, orchestrator, red-team-review, triple-loop-learning

Other plugins: estimated 114+ additional skills across 15+ plugins

Subagents (52 total — selected from agent-agentic-os)

os-architect-agent, os-architect-tester-agent, improvement-intake-agent, os-health-check, agentic-os-setup

Hooks

Unknown count. agent-agentic-os has a hooks/ directory.

Scripts

  • plugins/agent-agentic-os/scripts/evaluate.py — KEEP/DISCARD gate
  • plugins/agent-agentic-os/scripts/plot_eval_progress.py — visualize eval progress
  • bootstrap.py — install script
  • Hub-and-spoke shared scripts in plugins/<name>/scripts/ referenced via symlinks

MCP servers

.mcp.json exists. agent-agentic-os plugin.yaml lists provides_tools: [eval_runner, experiment_log, init_agentic_os].

05

Prompts

Prompts — agent-plugins-skills (richfrem)

Verbatim excerpt 1 — plugins/agent-agentic-os/skills/os-architect/SKILL.md

---
name: os-architect
plugin: agent-agentic-os
description: >
  SME-facing front-door skill for Agentic OS ecosystem evolution. Invokes the os-architect
  interview flow: classifies intent, audits existing capabilities, proposes evolution path
  (orchestrate / update / create), and dispatches work.
model: inherit
color: purple
tools: ["Bash", "Read", "Write"]
---

## Role

os-architect is the single entry point to the Agentic OS evolution ecosystem. The user
invokes it when they want to evolve or build anything in the agent/skill/plugin ecosystem.
It interviews, audits, and routes — never implements directly. The full behavior spec lives
in `agents/os-architect-agent.md`.

## Dispatch Paths

| Path | When | Mechanism |
|------|------|-----------|
| A+ — No Action | audit shows full match + all patterns present | tell user, no dispatch |
| A — Orchestrate | capability exists, current | route to existing agent/skill + run_agent.py |
| B — Update | capability exists, outdated/incomplete | `os-evolution-planner` writes plan + prompt → dispatch + optional improvement loop |
| C — Create | gap confirmed | `create-sub-agent` scaffold → `os-evolution-planner` plan + prompt → eval lab → evals HARD-GATE → `os-architect-tester` validates |

Prompting technique: Decision table routing with explicit dispatch paths. The SKILL.md is a router that never implements — it only classifies and dispatches. The A+/A/B/C path table is a formal state machine encoded as a markdown table. The HARD-GATE label on Path C's eval lab signals a non-negotiable quality requirement.

Verbatim excerpt 2 — plugins/agent-agentic-os/plugin.yaml

name: agent-agentic-os
version: 1.7.0
description: "Opinionated learning layer above Claude Code — structured memory hierarchy, continuous improvement loop, multi-agent event bus coordination, and eval-gated skill improvement."
author: Richard Fremmerlid
kind: standalone
provides_tools:
  - eval_runner
  - experiment_log
  - init_agentic_os

Prompting technique: Declarative plugin manifest — this is not a prompt but defines the plugin's contract: tools it provides, skills it registers, and its architectural identity. "Opinionated learning layer" is the core framing — the OS has opinions about how skills should improve.

09

Uniqueness

Uniqueness — agent-plugins-skills (richfrem)

differs_from_seeds

Closest to superpowers (SKILL.md format, skills-primary) and structurally similar to claude-flow (large component count, multi-tier memory), but richfrem's defining contribution is the eval-gated self-improvement loop — no seed has this. The Karpathy Autoresearch Loop (mutate SKILL.md → evaluate.py KEEP/DISCARD) applies machine learning methodology to prompt engineering: skills are improved through empirical evaluation, not human intuition. Against all seeds, this is the only framework that treats skill quality as a measurable, optimizable variable with longitudinal tracking.

Positioning

"Universal cross-platform plugin ecosystem with a self-improving OS." The "Transitional Architecture" philosophy is intellectually honest: this repo explicitly plans for its own obsolescence, treating itself as scaffolding until native SDK capabilities mature. The Karpathy Loop implementation (documented with a live convert-mermaid example showing 0.61 → 1.00 score improvement over 26 iterations) provides concrete evidence of the system's effectiveness.

Observable failure modes

  • 3 stars: minimal adoption — the complexity and philosophical depth may be inaccessible to typical Claude Code users.
  • Self-referential eval: evaluate.py for skills is written by the maintainer — if the eval criteria are wrong, the entire autoresearch loop optimizes toward the wrong goal.
  • Hub-and-spoke symlinks: npm/npx drops directory-level symlinks, making the symlink architecture fragile in automated install paths.
  • "Weekly obsolescence" acceptance: the framework's own philosophy predicts it will need to be discarded, which may reduce motivation for external contributors.

Cross-references

  • Contains spec-kitty-plugin — a bridge to the Spec-Kitty workflow
  • Contains claude-cli, copilot-cli, gemini-cli plugins — bridges to 3 agent frameworks
  • Explicitly treats agent-agentic-os and spec-kitty as "Transitional Architectures"
04

Workflow

Workflow — agent-plugins-skills (richfrem)

Phases

Phase Description Artifact
Install uvx ... plugin-add richfrem/agent-plugins-skills .agents/ populated
OS init /os-init OS state files, memory hierarchy
Intent /os-architect + describe intent Classified evolution request
Audit os-architect audits ecosystem Capability map
Propose os-architect proposes Path A/B/C Task plan + dispatch prompt
Execute os-evolution-planner + dispatch Updated skill or new plugin
Eval evaluate.py → exit 0 (KEEP) or 1 (DISCARD) Eval result
Backport os-eval-backport Human review gate → production
Log os-experiment-log Longitudinal tracking record

Karpathy Autoresearch Loop

1. Score skill with eval-autoresearch-fit
2. Mutate SKILL.md
3. Run evaluate.py
4. exit 0 → KEEP (new best if score > baseline)
5. exit 1 → DISCARD (revert to previous)
6. Repeat up to N iterations

Live example: convert-mermaid skill improved from 0.61 → 1.00 over 26 iterations across 2 rounds.

Approval gates

  1. os-eval-backport — human gate before lab winner moves to production
  2. os-architect proposes paths A/B/C — user selects dispatch path

Outer vs Inner flywheel

  • OUTER flywheel (os-improvement-loop): improves OS-level protocols and session ledgers between sessions
  • INNER flywheel (os-eval-runner): KEEP/DISCARD gate per iteration within a session
06

Memory Context

Memory & Context — agent-plugins-skills (richfrem)

State storage

Multi-tier: file-based (SKILL.md evals, experiment logs) + vector DB (O(log N)) + Obsidian wiki (full concept nodes).

Experiment log

os-experiment-log skill maintains a longitudinal record of eval results, enabling "synthesis" — understanding trends across many improvement iterations.

RLM Factory (Super-RAG tier 1)

O(1) keyword retrieval using RLM (Retrieval Language Model) for instant lookup.

Vector DB (Super-RAG tier 2)

O(log N) semantic search for approximate concept matching.

Obsidian Wiki Engine (Super-RAG tier 3)

Full concept node retrieval from Obsidian-format wiki files — highest fidelity, highest latency.

Memory management plugin

Dedicated memory-management plugin for memory lifecycle.

Persistence

  • Global: .agents/ (installed plugins)
  • Project: skills-lock.json, symlinks.json
  • Experiment: evals/ directory per skill (TSV format)

Compaction handling

The context-bundler plugin handles context management, suggesting context compaction is a first-class concern.

07

Orchestration

Orchestration — agent-plugins-skills (richfrem)

Multi-agent

Yes. 52 subagents. agent-loops provides 6 composable primitives: learning-loop, dual-loop, agent-swarm, red-team-review, triple-loop-learning, orchestrator.

Orchestration pattern

Hierarchical (os-architectos-improvement-loopos-eval-runneros-eval-backport) + swarm for agent-swarm primitive.

Isolation mechanism

Process isolation (evaluate.py scripts run in isolation; hub-and-spoke symlinks prevent coupling).

Multi-model

Not explicitly configured. Inherits model from hosting framework (model: inherit in SKILL.md).

Execution mode

Continuous-ralph for the improvement OS. One-shot for individual skills.

Consensus mechanism

None (KEEP/DISCARD is a binary gate, not consensus).

Prompt chaining

Yes. The improvement pipeline is explicit chaining: os-architect → os-evolution-planner → os-eval-runner → os-eval-backport → os-experiment-log.

Red-team pattern

The red-team-review execution primitive explicitly models adversarial review — one of six composable loop patterns.

08

Ui Cli Surface

UI/CLI Surface — agent-plugins-skills (richfrem)

Dedicated CLI binary

None from this repo. Uses uvx, npx skills, or bootstrap.py for install.

Local web dashboard

plot_eval_progress.py — a Python plotting script for live eval monitoring:

python plugins/agent-agentic-os/scripts/plot_eval_progress.py --tsv <lab>/evals/ --live

Renders a live eval progress chart (blue = baseline, green = new best, amber = kept but not record).

IDE integration

Cross-platform via .agents/ folder standard — works with Claude Code, Copilot, Gemini CLI, Roo Code, Windsurf, Cursor, Antigravity.

Observability

  • os-experiment-log skill — longitudinal TSV tracking of eval results
  • plot_eval_progress.py — live visualization during autoresearch runs
  • os-improvement-report skill — generates improvement reports
  • os-health-check agent — validates OS state

Install detail

The .agents/ folder is not committed — always empty in the repo. All install methods populate it at install time.

Related frameworks

same archetype · same primary tool · same memory type

alirezarezvani/claude-skills ★ 16k

313+ skills for 12 AI tools covering engineering, marketing, C-level advisory, compliance, research, and finance — all from one…

MoAI-ADK ★ 1.0k

Implements Harness Engineering as a Go-binary-installed Claude Code environment with auto-TDD/DDD methodology selection, 20-event…

REAP (c-d-cc/reap) ★ 41

Prevent context loss, scattered development, and forgotten lessons through a generation-based lifecycle where AI and human…

Codex Harness MCP ★ 7

Gives MCP-capable coding agents a local contract-lifecycle harness with governance audits and explicit completion gates.

meta-agent-teams (jbrahy) ★ 2

Build self-improving AI agent teams via a supervised training loop: specialist agents advise, a meta-agent evolves prompts based…

Browser Harness ★ 14k

Thin, self-healing CDP harness connecting an LLM to the user's real Chrome browser with coordinate-first clicking and…