Flokay (Codagent Agent Skills)

flokay · pacaplan/flokay · ★ 26 · last commit 2026-05-17

Full SDLC skill pack from idea evaluation through CI-green PR, with TDD Iron Law enforcement and external validator quality gates.

Best whenEvery task should be implemented by a dedicated subagent with TDD enforcement, validated by an external CLI, and merged only when CI is green.

Skip ifWriting production code before a failing test, Parallel task implementation (race conditions)

vs seeds

superpowersbut includes a meta-instruction ('That's rationalization') that preemptively blocks self-justification — stronger enforc…

Primitive shape 22 total

Skills 20 Subagents 2

Summary

Flokay (Codagent Agent Skills) — Summary

Flokay (GitHub: pacaplan/flokay, display name "Codagent Agent Skills" per README) is a comprehensive Claude Code and Codex plugin that provides a full software development lifecycle from idea evaluation through PR finalization, with 20 skills organized around TDD, multi-agent task execution, and automated CI management. The workflow begins with propose (GO/NO-GO verdict on whether an idea is worth building), progresses through spec (interview-driven requirements), design (technical architecture), plan-tasks (task breakdown for subagents), and implement-change (autonomous implementation via per-task subagents with TDD enforcement), and ends with finalize-pr (push, CI watch, fix failures, repeat until green). The implement-with-tdd skill contains an explicit Iron Law: "NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST — No exceptions" with instructions to delete any pre-written code and start over. An external agent-validator CLI is a required dependency for automated quality verification.

Among the seeds, flokay most closely resembles spec-driver (comprehensive skill-based SDLC) combined with openspec's artifact structure, but distinguishes itself with end-to-end PR lifecycle automation (wait-ci, fix-pr, finalize-pr), a quantitative quality gate (validator CLI), and the TDD Iron Law enforcement pattern borrowed from superpowers.

Overview

Flokay (Codagent Agent Skills) — Overview

Origin

Authored by pacaplan (GitHub), repo name flokay, display name "Codagent Agent Skills" (the README uses "Codagent" throughout). Last pushed 2026-05-17. 26 stars. MIT license. 1 contributor.

Philosophy

From the README:

"Evaluate before building — the agent critiques your idea, researches alternatives, and decides if it's worth pursuing"

"Right-sized tasks — breaks work into self-contained task files, each scoped for a single subagent to implement"

"Automated quality gates — Agent Validator runs static checks and AI code reviews for each task before moving on"

"End-to-end PR lifecycle — creates the PR, waits for CI, fixes failures, and addresses reviewer comments automatically"

The core philosophy: every piece of work should be evaluated before building, broken into subagent-sized tasks, implemented with TDD, validated by an external tool, and merged only when CI is green.

Relationship to Superpowers

The README states: "Inspired by obra/superpowers and OpenSpec." The TDD Iron Law in implement-with-tdd directly echoes superpowers' "Iron Law" approach to behavioral enforcement.

External Dependency: agent-validator

The agent-validator CLI is a required dependency:

npm install -g agent-validator
agent-validator init

This is the automated quality verification tool that runs static checks and AI code reviews. It is not bundled — it is a separate product.

Supported Platforms

Claude Code (primary): claude plugin marketplace add Codagent-AI/agent-skills && claude plugin install codagent
Codex (secondary): via marketplace and plugin manifest at .codex-plugin/plugin.json
Cursor: cursor plugins install Codagent-AI/agent-skills

Architecture

Flokay (Codagent Agent Skills) — Architecture

Distribution

Claude Code plugin via marketplace
Codex plugin via .codex-plugin/plugin.json
Cursor via cursor plugins install

Required Runtime

Claude Code / Codex / Cursor
agent-validator CLI (npm install -g agent-validator)

Repository Structure

flokay/
├── skills/                        # 20 skill directories
│   ├── ask-questions/SKILL.md
│   ├── design/SKILL.md
│   ├── finalize-pr/SKILL.md
│   ├── fix-pr/SKILL.md
│   ├── handoff/SKILL.md
│   ├── implement-and-validate/SKILL.md
│   ├── implement-change/SKILL.md
│   ├── implement-with-tdd/SKILL.md
│   ├── init/SKILL.md
│   ├── plan-tasks/SKILL.md
│   ├── proposal-review/SKILL.md
│   ├── propose/SKILL.md
│   ├── push-pr/SKILL.md
│   ├── review-assumptions/SKILL.md
│   ├── review-spec/SKILL.md
│   ├── session-report/SKILL.md
│   ├── simple-plan/SKILL.md
│   ├── spec/SKILL.md
│   ├── task-compliance/SKILL.md
│   └── wait-ci/SKILL.md
├── .claude-plugin/
│   ├── plugin.json
│   └── marketplace.json
├── .agents/skills/                # Cross-agent skills directory
├── .codex-plugin/plugin.json      # Codex plugin manifest
├── .cursor-plugin/                # Cursor plugin files
├── .claude/                       # Claude Code settings
├── .validator/                    # Validator configuration
├── CLAUDE.md                      # Development notes
└── AGENTS.md                      # Agent usage guide

Target AI Tools

Claude Code (primary, namespaced as codagent:<skill>)
Codex (invoked by name in chat, e.g., use the codagent:propose skill)
Cursor

Plugin Invocation

# Claude Code
/codagent:propose
/codagent:spec
/codagent:implement-change

# Codex
use the codagent:implement-with-tdd skill

Components

Flokay (Codagent Agent Skills) — Components

Skills (20)

Skill Name	Purpose
`init`	Initializes Codagent; checks agent-validator CLI is installed. Safe to re-run.
`propose`	Evaluates idea worth building; GO / GO WITH CAVEATS / NO-GO verdict; writes proposal.md
`spec`	Interview-driven requirement discovery; WHEN/THEN scenarios per capability
`design`	Collaborative technical design; 2-3 approaches with tradeoffs; writes design.md
`review-spec`	Reviews artifacts (proposal, specs, design, tasks) for consistency and gaps
`plan-tasks`	Structured task breakdown; each task file is self-contained for a single subagent
`simple-plan`	Lightweight version of propose+spec+design+plan-tasks for small quick changes
`implement-change`	Autonomous tech lead; dispatches `implement-and-validate` per task sequentially; runs validator; archives; finalizes PR
`implement-with-tdd`	TDD enforcement: write failing test → watch fail → write minimal code → refactor. Iron Law: no production code without failing test first
`implement-and-validate`	Per-task executor; calls `implement-with-tdd`; runs Agent Validator; commits on success
`push-pr`	Commits changes, pushes to remote, creates or updates PR; runs validator pre-commit
`wait-ci`	Polls CI status for current branch PR; enriches failures with Actions logs
`fix-pr`	Fixes CI failures and review comments; dispatches fixer subagent; verifies with validator
`finalize-pr`	Orchestrates full post-implementation loop: push → CI watch → fix failures → repeat until green (max 3 cycles)
`ask-questions`	Structured question-asking for clarification (utility)
`review-assumptions`	Reviews assumptions made during design/planning
`proposal-review`	Focused proposal artifact review
`session-report`	Generates session summary report
`handoff`	Prepares context for handing off to another agent or session
`task-compliance`	Checks task implementation against spec requirements

Prompts

Flokay (Codagent Agent Skills) — Prompts

Excerpt 1 — propose Skill (skills/propose/SKILL.md)

---
name: propose
description: >
  Evaluate whether a software idea is worth building, then write the proposal document.
  Use when the user wants to assess an idea, says "evaluate", "propose", "is this worth 
  building", or "should we build". If the idea passes evaluation, write the proposal 
  document using the provided template.
---

# Propose

Evaluate whether an idea is worth formalizing, and if so, write the proposal document. 
This sits between optional freeform exploration and the formal design artifact.

**The proposal is a "why + high-level what + high-level how" document.** Deeply understand 
and articulate the motivation — the problem, the opportunity, and why it matters now. 
Scope "what" at a high level — enough to bound the change and identify capabilities, but 
leave detailed behavioral requirements for specs. Sketch the high-level technical approach 
and architecture — enough to ground the change in reality and surface structural risks 
early — but leave detailed design for design.md.

## Principles

- **Conversational evaluation, documentary proposal** — The evaluation phase is a conversation. 
  The proposal phase produces `proposal.md`. Keep these phases distinct.
- **Research before recommending** — Use web search and codebase exploration before deciding.

Prompting technique: Phase-separation instruction. The skill explicitly distinguishes "evaluation phase (conversation)" from "proposal phase (document writing)" — preventing the agent from conflating exploration with artifact production.

Excerpt 2 — implement-with-tdd Skill (Iron Law)

---
name: implement-with-tdd
description: >
  Enforces test-driven development for feature work, bug fixes, and refactoring, activating 
  for requests such as "implement", "fix", "add feature", "write tests first", or "TDD".
---

# Test-Driven Development (TDD)

## Overview

Write the test first. Watch it fail. Write minimal code to pass.

**Core principle:** Watch the test fail. Only then is it proven to test the right thing.

**Violating the letter of the rules is violating the spirit of the rules.**

## The Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST


Write code before the test? Delete it. Start over.

**No exceptions:**
- Don't keep it as "reference"
- Don't "adapt" it while writing tests
- Don't look at it for inspiration
Delete it. Write the test. Watch it fail.

## Exceptions (skip TDD only for these)

- Throwaway spikes that will never be merged (branch must be deleted before implementation 
  begins; any code carried forward requires full TDD)
- Generated code (scaffolded, not behavior-bearing)
- Configuration-only changes

Thinking "skip TDD just this once"? Stop. That's rationalization.

Prompting technique: Iron Law with explicit rationalization-blocker. The phrase "Thinking 'skip TDD just this once'? Stop. That's rationalization." is a meta-instruction that preemptively blocks the agent's own self-justification patterns. This is the same technique as superpowers' Iron Laws.

Excerpt 3 — implement-change Sequential Constraint

**implement-and-validate** subagent per task (sequentially, never in parallel)

Prompting technique: Explicit anti-parallelism instruction. The word "never" prevents the agent from attempting to spawn multiple subagents simultaneously, which would create race conditions in file editing.

Uniqueness

Flokay (Codagent Agent Skills) — Uniqueness

Differs from Seeds

Flokay is a comprehensive SDLC skill pack most closely resembling spec-driver (which also provides a full suite of development lifecycle skills), but extends it with three capabilities absent from all seeds: (1) end-to-end PR lifecycle automation — wait-ci, fix-pr, finalize-pr that poll CI status, enrich failure logs, and iterate until green or hit a 3-cycle limit; (2) an external validator CLI (agent-validator) as a mandatory quality gate rather than LLM self-review; and (3) the session-report and handoff skills for session continuity, not present in spec-driver. The TDD Iron Law in implement-with-tdd is directly borrowed from superpowers' Iron Law pattern but is expressed more explicitly (with the "That's rationalization" preemptive blocker). The sequential-only constraint on implement-change is explicitly stated in a way no seed documents — it prevents parallelism to avoid race conditions, which is a considered architectural decision visible in the prompt text itself.

Positioning

Flokay covers the longest code-to-merged-PR lifecycle of any skills-only framework in the corpus. Where openspec stops at artifact generation and spec-driver stops at implementation, flokay continues through CI watching, PR fix loops, and merge. The finalize-pr 3-cycle limit is a form of bounded autonomy — the agent will try to fix CI up to three times before stopping, preventing infinite loops.

Observable Failure Modes

agent-validator dependency: The entire quality gate depends on an external CLI that is a separate product. If agent-validator is unavailable, quality gates fail silently.
Sequential implementation at scale: For large features with many tasks, sequential execution is slow. The explicit "never in parallel" constraint is safe but slow.
3-cycle PR fix limit: The finalize-pr loop stops after 3 cycles regardless of whether CI is green, potentially leaving broken PRs.
No spec format portability: The artifact format (proposal.md, design.md, tasks.md) is specific to Codagent and not compatible with OpenSpec or other tooling.

Workflow

Flokay (Codagent Agent Skills) — Workflow

Full SDLC Workflow

Phase	Skill	Artifact	Gate
1. Evaluate	`propose`	proposal.md (GO/NO-GO verdict)	User review of GO/NO-GO
2. Specify	`spec`	Spec files with WHEN/THEN scenarios	Per-capability Q&A
3. Design	`design`	design.md (2-3 approaches + tradeoffs)	User selects approach
4. Review	`review-spec`	Review findings	User approval
5. Plan	`plan-tasks`	Self-contained task files per subagent	None
6. Implement	`implement-change`	Code + commits (per task via subagent)	Validator pass per task
7. Push PR	`push-pr`	Pull request on remote	Validator pre-commit
8. Watch CI	`wait-ci`	CI status + enriched failure logs	None
9. Fix failures	`fix-pr`	Patched code, updated PR	Validator pass
10. Finalize	`finalize-pr`	Merged or review-ready PR	CI green (max 3 fix cycles)

Quick Path (small changes)

simple-plan → implement-change → finalize-pr

Per-Task Execution (in `implement-change`)

For each task file:

Dispatch implement-and-validate subagent
Subagent calls implement-with-tdd (write test → fail → write code → pass)
Subagent runs Agent Validator (static checks + AI code review)
If validator passes: commit task
If validator fails: fix and retry
Move to next task (sequential, never parallel)

Finalize PR Loop (max 3 cycles)

push-pr → wait-ci → fix-pr → wait-ci → fix-pr → wait-ci → STOP (if still failing)

Memory Context

Flokay (Codagent Agent Skills) — Memory & Context

State Storage

State Type	Storage	Persistence
Proposal	`proposal.md`	Project
Specifications	Spec files with WHEN/THEN scenarios	Project
Technical design	`design.md`	Project
Task files	Per-task markdown files	Project
Git history	Git	Project
PR state	GitHub PR	Remote

Per-Task Context (Self-Contained Design)

Each task file generated by plan-tasks is designed to be self-contained: it includes everything a subagent needs to implement it (context from proposal, design, specs). This is a key architectural decision — the task file is the unit of context handed to the implement-and-validate subagent, avoiding the need for the subagent to re-read all prior artifacts.

Cross-Session Resumability

Because all artifacts are committed to git, the workflow can resume at any point. The simple-plan skill can generate a placeholder tasks.md so small changes still fit into the artifact structure and can be resumed.

Memory Type

file-based — proposal.md, design.md, spec files, task files. No database or vector store.

Context Handoff Skill

The handoff skill explicitly prepares context for handing off to another agent or session, making cross-session continuity a first-class workflow concern.

Orchestration

Flokay (Codagent Agent Skills) — Orchestration

Multi-Agent

Yes. implement-change dispatches one implement-and-validate subagent per task. finalize-pr dispatches fixer subagents. fix-pr dispatches fixer subagents.

Orchestration Pattern

sequential — implement-change dispatches subagents "sequentially, never in parallel." This is an explicit design choice to prevent race conditions in file editing.

Subagent Spawn Mechanism

Skills invoke subagents via Claude Code's Task tool delegation mechanism. Each task gets its own subagent instance.

Subagent Definition Format

skill-md — subagents are invoked by skill name (implement-and-validate, implement-with-tdd), not defined as standalone persona files.

Isolation Mechanism

none — all work happens in the main project directory. No worktrees or containers.

Execution Mode

one-shot per phase — each skill runs to completion before the next is invoked. The overall workflow is interactive-loop because user input is required at multiple gates.

Multi-Model

No — all skills run on the same underlying model (Claude Code's configured model).

Prompt Chaining

Yes — strong chaining: proposal → specs → design → tasks → implement-and-validate (per task). Each stage's output is the next stage's input. The task file is the formalized chain: it bundles context from all prior stages for the implementation subagent.

Consensus Mechanism

None.

Quality Gate: Agent Validator

An external CLI (agent-validator) runs after each task implementation. This is an automated external validator, not an LLM self-review. The validator runs static checks and AI code reviews. This is the most external-tool-integrated quality gate in this batch.

Ui Cli Surface

Flokay (Codagent Agent Skills) — UI & CLI Surface

Dedicated CLI Binary

No. All invocation is via Claude Code slash commands or Codex/Cursor skill invocations.

Local Web Dashboard

No.

Skill Invocation Surface

Claude Code

/codagent:init
/codagent:propose
/codagent:spec
/codagent:design
/codagent:review-spec
/codagent:plan-tasks
/codagent:simple-plan
/codagent:implement-change
/codagent:implement-with-tdd
/codagent:implement-and-validate
/codagent:push-pr
/codagent:wait-ci
/codagent:fix-pr
/codagent:finalize-pr

Codex

use the codagent:propose skill
use the codagent:implement-with-tdd skill

Cursor

cursor plugins install Codagent-AI/agent-skills

Installation

# Claude Code
claude plugin marketplace add Codagent-AI/agent-skills
claude plugin install codagent

# Initialize (after install)
/codagent:init

# Codex
codex plugin marketplace add Codagent-AI/agent-skills
# Restart Codex → /plugins → select Codagent

# Cursor
cursor plugins install Codagent-AI/agent-skills

Update Process

claude plugin marketplace update codagent
claude plugin update codagent@codagent
/codagent:init  # refresh skills in Claude Code

Observability

Agent Validator provides quality metrics per task
Git history serves as audit trail
PR comments + CI logs surface as structured context via wait-ci

Related frameworks

same archetype · same primary tool · same memory type

Symphony (OpenAI) ★ 25k

A7 PR-lifecycle platform

A language-agnostic specification and Elixir reference daemon that continuously polls Linear and dispatches isolated Codex…

cli-agent-orchestrator (awslabs) ★ 634

A7 PR-lifecycle platform

tmux-based supervisor-worker orchestration across 7 AI coding CLIs, with MCP coordination primitives, persistent memory,…

Agent Orchestrator (ComposioHQ)

A7 PR-lifecycle platform

Continuous Claude ★ 3.8k

A7 PR-lifecycle platform

Compound learning across Claude Code sessions via PostgreSQL memory extraction from thinking blocks, YAML handoffs for session…

ORCH ★ 68

A7 PR-lifecycle platform

Orchestrates teams of parallel AI agents (CTO + workers + reviewer) on a single codebase with YAML flat-file state, a TUI…

CUA ★ 17k

A9 Sandbox substrate

Unified SDK for building, benchmarking, and deploying agents that interact with full OS GUIs via isolated VMs.

Distribution

Type: claude-plugin
License: MIT
Install: multi-step

Surfaces

CLI binary: No
CLI subcmds: 0
Local UI: No

Components

Commands: 0
Skills: 20
Subagents: 2
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 0
Templates: 0

Workflow

Phases: 10
Approval gates: 4
Spec format: markdown
Spec storage: per-feature-folder
Delta or full: whole-file

Orchestration

Multi-agent: Yes
Pattern: sequential
Max concurrent: 2
Isolation: none
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: No
BYOK: Yes
Modal: text

Execution

Mode: interactive-loop
Crash recovery: Yes
Compaction: Yes
Session handoff: Yes
Streaming: No

Memory

Type: file-based
Persistence: project
Search: none
State files: 4 files

Quality

TDD: Yes
TDD mechanism: prompt-iron-law
Validators: 3
Self-review: adversarial-subagent

Git / Observability

Auto commit: Yes
Auto PR: Yes
Auto merge: No
Worktree/feat: No
Audit log: No
Audit format: none
Replay: Yes

Tools

Primary: Claude Code
Targets: 3
Portability: medium

Signals

Stars: 26
Last commit: 2026-05-17
Contributors: 1
Maintainer: active
Quality score: 5.8/10

Summary

Flokay (Codagent Agent Skills) — Summary

Overview

Flokay (Codagent Agent Skills) — Overview

Origin

Philosophy

Relationship to Superpowers

External Dependency: agent-validator

Supported Platforms

Architecture

Flokay (Codagent Agent Skills) — Architecture

Distribution

Required Runtime

Repository Structure

Target AI Tools

Plugin Invocation

Components

Flokay (Codagent Agent Skills) — Components

Skills (20)

Prompts

Flokay (Codagent Agent Skills) — Prompts

Excerpt 1 — propose Skill (skills/propose/SKILL.md)

Excerpt 2 — implement-with-tdd Skill (Iron Law)

Excerpt 3 — implement-change Sequential Constraint

Uniqueness

Flokay (Codagent Agent Skills) — Uniqueness

Differs from Seeds

Positioning

Observable Failure Modes

Workflow

Flokay (Codagent Agent Skills) — Workflow

Full SDLC Workflow

Quick Path (small changes)

Per-Task Execution (in implement-change)

Finalize PR Loop (max 3 cycles)

Memory Context

Flokay (Codagent Agent Skills) — Memory & Context

State Storage

Per-Task Context (Self-Contained Design)

Cross-Session Resumability

Memory Type

Context Handoff Skill

Orchestration

Flokay (Codagent Agent Skills) — Orchestration

Multi-Agent

Orchestration Pattern

Subagent Spawn Mechanism

Subagent Definition Format

Isolation Mechanism

Execution Mode

Multi-Model

Prompt Chaining

Consensus Mechanism

Quality Gate: Agent Validator

Ui Cli Surface

Flokay (Codagent Agent Skills) — UI & CLI Surface

Dedicated CLI Binary

Local Web Dashboard

Skill Invocation Surface

Claude Code

Codex

Cursor

Installation

Update Process

Observability

Related frameworks

Per-Task Execution (in `implement-change`)