Skip to content
/

Flokay (Codagent Agent Skills)

flokay · pacaplan/flokay · ★ 26 · last commit 2026-05-17

Full SDLC skill pack from idea evaluation through CI-green PR, with TDD Iron Law enforcement and external validator quality gates.

Best whenEvery task should be implemented by a dedicated subagent with TDD enforcement, validated by an external CLI, and merged only when CI is green.
Skip ifWriting production code before a failing test, Parallel task implementation (race conditions)
vs seeds
superpowersbut includes a meta-instruction ('That's rationalization') that preemptively blocks self-justification — stronger enforc…
Primitive shape 22 total
Skills 20 Subagents 2
00

Summary

Flokay (Codagent Agent Skills) — Summary

Flokay (GitHub: pacaplan/flokay, display name "Codagent Agent Skills" per README) is a comprehensive Claude Code and Codex plugin that provides a full software development lifecycle from idea evaluation through PR finalization, with 20 skills organized around TDD, multi-agent task execution, and automated CI management. The workflow begins with propose (GO/NO-GO verdict on whether an idea is worth building), progresses through spec (interview-driven requirements), design (technical architecture), plan-tasks (task breakdown for subagents), and implement-change (autonomous implementation via per-task subagents with TDD enforcement), and ends with finalize-pr (push, CI watch, fix failures, repeat until green). The implement-with-tdd skill contains an explicit Iron Law: "NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST — No exceptions" with instructions to delete any pre-written code and start over. An external agent-validator CLI is a required dependency for automated quality verification.

Among the seeds, flokay most closely resembles spec-driver (comprehensive skill-based SDLC) combined with openspec's artifact structure, but distinguishes itself with end-to-end PR lifecycle automation (wait-ci, fix-pr, finalize-pr), a quantitative quality gate (validator CLI), and the TDD Iron Law enforcement pattern borrowed from superpowers.

01

Overview

Flokay (Codagent Agent Skills) — Overview

Origin

Authored by pacaplan (GitHub), repo name flokay, display name "Codagent Agent Skills" (the README uses "Codagent" throughout). Last pushed 2026-05-17. 26 stars. MIT license. 1 contributor.

Philosophy

From the README:

"Evaluate before building — the agent critiques your idea, researches alternatives, and decides if it's worth pursuing"

"Right-sized tasks — breaks work into self-contained task files, each scoped for a single subagent to implement"

"Automated quality gates — Agent Validator runs static checks and AI code reviews for each task before moving on"

"End-to-end PR lifecycle — creates the PR, waits for CI, fixes failures, and addresses reviewer comments automatically"

The core philosophy: every piece of work should be evaluated before building, broken into subagent-sized tasks, implemented with TDD, validated by an external tool, and merged only when CI is green.

Relationship to Superpowers

The README states: "Inspired by obra/superpowers and OpenSpec." The TDD Iron Law in implement-with-tdd directly echoes superpowers' "Iron Law" approach to behavioral enforcement.

External Dependency: agent-validator

The agent-validator CLI is a required dependency:

npm install -g agent-validator
agent-validator init

This is the automated quality verification tool that runs static checks and AI code reviews. It is not bundled — it is a separate product.

Supported Platforms

  • Claude Code (primary): claude plugin marketplace add Codagent-AI/agent-skills && claude plugin install codagent
  • Codex (secondary): via marketplace and plugin manifest at .codex-plugin/plugin.json
  • Cursor: cursor plugins install Codagent-AI/agent-skills
02

Architecture

Flokay (Codagent Agent Skills) — Architecture

Distribution

  • Claude Code plugin via marketplace
  • Codex plugin via .codex-plugin/plugin.json
  • Cursor via cursor plugins install

Required Runtime

  • Claude Code / Codex / Cursor
  • agent-validator CLI (npm install -g agent-validator)

Repository Structure

flokay/
├── skills/                        # 20 skill directories
│   ├── ask-questions/SKILL.md
│   ├── design/SKILL.md
│   ├── finalize-pr/SKILL.md
│   ├── fix-pr/SKILL.md
│   ├── handoff/SKILL.md
│   ├── implement-and-validate/SKILL.md
│   ├── implement-change/SKILL.md
│   ├── implement-with-tdd/SKILL.md
│   ├── init/SKILL.md
│   ├── plan-tasks/SKILL.md
│   ├── proposal-review/SKILL.md
│   ├── propose/SKILL.md
│   ├── push-pr/SKILL.md
│   ├── review-assumptions/SKILL.md
│   ├── review-spec/SKILL.md
│   ├── session-report/SKILL.md
│   ├── simple-plan/SKILL.md
│   ├── spec/SKILL.md
│   ├── task-compliance/SKILL.md
│   └── wait-ci/SKILL.md
├── .claude-plugin/
│   ├── plugin.json
│   └── marketplace.json
├── .agents/skills/                # Cross-agent skills directory
├── .codex-plugin/plugin.json      # Codex plugin manifest
├── .cursor-plugin/                # Cursor plugin files
├── .claude/                       # Claude Code settings
├── .validator/                    # Validator configuration
├── CLAUDE.md                      # Development notes
└── AGENTS.md                      # Agent usage guide

Target AI Tools

  • Claude Code (primary, namespaced as codagent:<skill>)
  • Codex (invoked by name in chat, e.g., use the codagent:propose skill)
  • Cursor

Plugin Invocation

# Claude Code
/codagent:propose
/codagent:spec
/codagent:implement-change

# Codex
use the codagent:implement-with-tdd skill
03

Components

Flokay (Codagent Agent Skills) — Components

Skills (20)

Skill Name Purpose
init Initializes Codagent; checks agent-validator CLI is installed. Safe to re-run.
propose Evaluates idea worth building; GO / GO WITH CAVEATS / NO-GO verdict; writes proposal.md
spec Interview-driven requirement discovery; WHEN/THEN scenarios per capability
design Collaborative technical design; 2-3 approaches with tradeoffs; writes design.md
review-spec Reviews artifacts (proposal, specs, design, tasks) for consistency and gaps
plan-tasks Structured task breakdown; each task file is self-contained for a single subagent
simple-plan Lightweight version of propose+spec+design+plan-tasks for small quick changes
implement-change Autonomous tech lead; dispatches implement-and-validate per task sequentially; runs validator; archives; finalizes PR
implement-with-tdd TDD enforcement: write failing test → watch fail → write minimal code → refactor. Iron Law: no production code without failing test first
implement-and-validate Per-task executor; calls implement-with-tdd; runs Agent Validator; commits on success
push-pr Commits changes, pushes to remote, creates or updates PR; runs validator pre-commit
wait-ci Polls CI status for current branch PR; enriches failures with Actions logs
fix-pr Fixes CI failures and review comments; dispatches fixer subagent; verifies with validator
finalize-pr Orchestrates full post-implementation loop: push → CI watch → fix failures → repeat until green (max 3 cycles)
ask-questions Structured question-asking for clarification (utility)
review-assumptions Reviews assumptions made during design/planning
proposal-review Focused proposal artifact review
session-report Generates session summary report
handoff Prepares context for handing off to another agent or session
task-compliance Checks task implementation against spec requirements
05

Prompts

Flokay (Codagent Agent Skills) — Prompts

Excerpt 1 — propose Skill (skills/propose/SKILL.md)

---
name: propose
description: >
  Evaluate whether a software idea is worth building, then write the proposal document.
  Use when the user wants to assess an idea, says "evaluate", "propose", "is this worth 
  building", or "should we build". If the idea passes evaluation, write the proposal 
  document using the provided template.
---

# Propose

Evaluate whether an idea is worth formalizing, and if so, write the proposal document. 
This sits between optional freeform exploration and the formal design artifact.

**The proposal is a "why + high-level what + high-level how" document.** Deeply understand 
and articulate the motivation — the problem, the opportunity, and why it matters now. 
Scope "what" at a high level — enough to bound the change and identify capabilities, but 
leave detailed behavioral requirements for specs. Sketch the high-level technical approach 
and architecture — enough to ground the change in reality and surface structural risks 
early — but leave detailed design for design.md.

## Principles

- **Conversational evaluation, documentary proposal** — The evaluation phase is a conversation. 
  The proposal phase produces `proposal.md`. Keep these phases distinct.
- **Research before recommending** — Use web search and codebase exploration before deciding.

Prompting technique: Phase-separation instruction. The skill explicitly distinguishes "evaluation phase (conversation)" from "proposal phase (document writing)" — preventing the agent from conflating exploration with artifact production.

Excerpt 2 — implement-with-tdd Skill (Iron Law)

---
name: implement-with-tdd
description: >
  Enforces test-driven development for feature work, bug fixes, and refactoring, activating 
  for requests such as "implement", "fix", "add feature", "write tests first", or "TDD".
---

# Test-Driven Development (TDD)

## Overview

Write the test first. Watch it fail. Write minimal code to pass.

**Core principle:** Watch the test fail. Only then is it proven to test the right thing.

**Violating the letter of the rules is violating the spirit of the rules.**

## The Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST


Write code before the test? Delete it. Start over.

**No exceptions:**
- Don't keep it as "reference"
- Don't "adapt" it while writing tests
- Don't look at it for inspiration
Delete it. Write the test. Watch it fail.

## Exceptions (skip TDD only for these)

- Throwaway spikes that will never be merged (branch must be deleted before implementation 
  begins; any code carried forward requires full TDD)
- Generated code (scaffolded, not behavior-bearing)
- Configuration-only changes

Thinking "skip TDD just this once"? Stop. That's rationalization.

Prompting technique: Iron Law with explicit rationalization-blocker. The phrase "Thinking 'skip TDD just this once'? Stop. That's rationalization." is a meta-instruction that preemptively blocks the agent's own self-justification patterns. This is the same technique as superpowers' Iron Laws.

Excerpt 3 — implement-change Sequential Constraint

**implement-and-validate** subagent per task (sequentially, never in parallel)

Prompting technique: Explicit anti-parallelism instruction. The word "never" prevents the agent from attempting to spawn multiple subagents simultaneously, which would create race conditions in file editing.

09

Uniqueness

Flokay (Codagent Agent Skills) — Uniqueness

Differs from Seeds

Flokay is a comprehensive SDLC skill pack most closely resembling spec-driver (which also provides a full suite of development lifecycle skills), but extends it with three capabilities absent from all seeds: (1) end-to-end PR lifecycle automation — wait-ci, fix-pr, finalize-pr that poll CI status, enrich failure logs, and iterate until green or hit a 3-cycle limit; (2) an external validator CLI (agent-validator) as a mandatory quality gate rather than LLM self-review; and (3) the session-report and handoff skills for session continuity, not present in spec-driver. The TDD Iron Law in implement-with-tdd is directly borrowed from superpowers' Iron Law pattern but is expressed more explicitly (with the "That's rationalization" preemptive blocker). The sequential-only constraint on implement-change is explicitly stated in a way no seed documents — it prevents parallelism to avoid race conditions, which is a considered architectural decision visible in the prompt text itself.

Positioning

Flokay covers the longest code-to-merged-PR lifecycle of any skills-only framework in the corpus. Where openspec stops at artifact generation and spec-driver stops at implementation, flokay continues through CI watching, PR fix loops, and merge. The finalize-pr 3-cycle limit is a form of bounded autonomy — the agent will try to fix CI up to three times before stopping, preventing infinite loops.

Observable Failure Modes

  1. agent-validator dependency: The entire quality gate depends on an external CLI that is a separate product. If agent-validator is unavailable, quality gates fail silently.
  2. Sequential implementation at scale: For large features with many tasks, sequential execution is slow. The explicit "never in parallel" constraint is safe but slow.
  3. 3-cycle PR fix limit: The finalize-pr loop stops after 3 cycles regardless of whether CI is green, potentially leaving broken PRs.
  4. No spec format portability: The artifact format (proposal.md, design.md, tasks.md) is specific to Codagent and not compatible with OpenSpec or other tooling.
04

Workflow

Flokay (Codagent Agent Skills) — Workflow

Full SDLC Workflow

Phase Skill Artifact Gate
1. Evaluate propose proposal.md (GO/NO-GO verdict) User review of GO/NO-GO
2. Specify spec Spec files with WHEN/THEN scenarios Per-capability Q&A
3. Design design design.md (2-3 approaches + tradeoffs) User selects approach
4. Review review-spec Review findings User approval
5. Plan plan-tasks Self-contained task files per subagent None
6. Implement implement-change Code + commits (per task via subagent) Validator pass per task
7. Push PR push-pr Pull request on remote Validator pre-commit
8. Watch CI wait-ci CI status + enriched failure logs None
9. Fix failures fix-pr Patched code, updated PR Validator pass
10. Finalize finalize-pr Merged or review-ready PR CI green (max 3 fix cycles)

Quick Path (small changes)

simple-planimplement-changefinalize-pr

Per-Task Execution (in implement-change)

For each task file:

  1. Dispatch implement-and-validate subagent
  2. Subagent calls implement-with-tdd (write test → fail → write code → pass)
  3. Subagent runs Agent Validator (static checks + AI code review)
  4. If validator passes: commit task
  5. If validator fails: fix and retry
  6. Move to next task (sequential, never parallel)

Finalize PR Loop (max 3 cycles)

push-pr → wait-ci → fix-pr → wait-ci → fix-pr → wait-ci → STOP (if still failing)
06

Memory Context

Flokay (Codagent Agent Skills) — Memory & Context

State Storage

State Type Storage Persistence
Proposal proposal.md Project
Specifications Spec files with WHEN/THEN scenarios Project
Technical design design.md Project
Task files Per-task markdown files Project
Git history Git Project
PR state GitHub PR Remote

Per-Task Context (Self-Contained Design)

Each task file generated by plan-tasks is designed to be self-contained: it includes everything a subagent needs to implement it (context from proposal, design, specs). This is a key architectural decision — the task file is the unit of context handed to the implement-and-validate subagent, avoiding the need for the subagent to re-read all prior artifacts.

Cross-Session Resumability

Because all artifacts are committed to git, the workflow can resume at any point. The simple-plan skill can generate a placeholder tasks.md so small changes still fit into the artifact structure and can be resumed.

Memory Type

file-based — proposal.md, design.md, spec files, task files. No database or vector store.

Context Handoff Skill

The handoff skill explicitly prepares context for handing off to another agent or session, making cross-session continuity a first-class workflow concern.

07

Orchestration

Flokay (Codagent Agent Skills) — Orchestration

Multi-Agent

Yes. implement-change dispatches one implement-and-validate subagent per task. finalize-pr dispatches fixer subagents. fix-pr dispatches fixer subagents.

Orchestration Pattern

sequentialimplement-change dispatches subagents "sequentially, never in parallel." This is an explicit design choice to prevent race conditions in file editing.

Subagent Spawn Mechanism

Skills invoke subagents via Claude Code's Task tool delegation mechanism. Each task gets its own subagent instance.

Subagent Definition Format

skill-md — subagents are invoked by skill name (implement-and-validate, implement-with-tdd), not defined as standalone persona files.

Isolation Mechanism

none — all work happens in the main project directory. No worktrees or containers.

Execution Mode

one-shot per phase — each skill runs to completion before the next is invoked. The overall workflow is interactive-loop because user input is required at multiple gates.

Multi-Model

No — all skills run on the same underlying model (Claude Code's configured model).

Prompt Chaining

Yes — strong chaining: proposal → specs → design → tasks → implement-and-validate (per task). Each stage's output is the next stage's input. The task file is the formalized chain: it bundles context from all prior stages for the implementation subagent.

Consensus Mechanism

None.

Quality Gate: Agent Validator

An external CLI (agent-validator) runs after each task implementation. This is an automated external validator, not an LLM self-review. The validator runs static checks and AI code reviews. This is the most external-tool-integrated quality gate in this batch.

08

Ui Cli Surface

Flokay (Codagent Agent Skills) — UI & CLI Surface

Dedicated CLI Binary

No. All invocation is via Claude Code slash commands or Codex/Cursor skill invocations.

Local Web Dashboard

No.

Skill Invocation Surface

Claude Code

/codagent:init
/codagent:propose
/codagent:spec
/codagent:design
/codagent:review-spec
/codagent:plan-tasks
/codagent:simple-plan
/codagent:implement-change
/codagent:implement-with-tdd
/codagent:implement-and-validate
/codagent:push-pr
/codagent:wait-ci
/codagent:fix-pr
/codagent:finalize-pr

Codex

use the codagent:propose skill
use the codagent:implement-with-tdd skill

Cursor

cursor plugins install Codagent-AI/agent-skills

Installation

# Claude Code
claude plugin marketplace add Codagent-AI/agent-skills
claude plugin install codagent

# Initialize (after install)
/codagent:init

# Codex
codex plugin marketplace add Codagent-AI/agent-skills
# Restart Codex → /plugins → select Codagent

# Cursor
cursor plugins install Codagent-AI/agent-skills

Update Process

claude plugin marketplace update codagent
claude plugin update codagent@codagent
/codagent:init  # refresh skills in Claude Code

Observability

  • Agent Validator provides quality metrics per task
  • Git history serves as audit trail
  • PR comments + CI logs surface as structured context via wait-ci

Related frameworks

same archetype · same primary tool · same memory type

Symphony (OpenAI) ★ 25k

A language-agnostic specification and Elixir reference daemon that continuously polls Linear and dispatches isolated Codex…

cli-agent-orchestrator (awslabs) ★ 634

tmux-based supervisor-worker orchestration across 7 AI coding CLIs, with MCP coordination primitives, persistent memory,…

Agent Orchestrator (ComposioHQ)
Continuous Claude ★ 3.8k

Compound learning across Claude Code sessions via PostgreSQL memory extraction from thinking blocks, YAML handoffs for session…

ORCH ★ 68

Orchestrates teams of parallel AI agents (CTO + workers + reviewer) on a single codebase with YAML flat-file state, a TUI…

CUA ★ 17k

Unified SDK for building, benchmarking, and deploying agents that interact with full OS GUIs via isolated VMs.