Context Engineering Kit

context-engineering-kit-neolabhq · NeoLabHQ/context-engineering-kit · ★ 1.0k · last commit 2026-05-24

Token-efficient multi-plugin engineering kit with automatic adversarial self-reflection and granular plugin installs.

Best whenQuality gates should fire automatically after every agent turn via Stop hooks, not require user-invoked commands.

Skip ifGeneral information skills that inflate context without improving outcomes, Sycophantic self-review (replaced by adversarial ruthless-gatekeeper persona)

vs seeds

spec-kiton spec workflows, using Arc42 instead.

Primitive shape 9 total

Commands 1 Skills 3 Subagents 3 Hooks 2

Summary

Context Engineering Kit (NeoLabHQ) — Summary

Context Engineering Kit (CEK) is a production-tested, multi-plugin Claude Code marketplace from NeoLabHQ shipping 14 standalone plugins (reflexion, sdd, sadd, ddd, tdd, kaizen, git, review, mcp, tech-stack, docs, fpf, customaize-agent, and a customizable-agent framework). Its differentiating feature is a TypeScript/Bun hook system in the reflexion plugin that fires on Stop and UserPromptSubmit events to automate post-response quality reflection. The kit targets token efficiency explicitly: plugins are granular so users install only what they need, and prompts are engineered to minimize context footprint while maximizing agent result quality. It claims 99% success on real-life production code after a v2.0 SDD rewrite. CEK supports Claude Code natively plus Cursor, Antigravity, Codex, and OpenCode via the npx skills add command from vercel-labs/skills. It also integrates with OpenSkills and is listed on the Cursor Plugin Directory.

differs_from_seeds: Closest to spec-driver (Archetype 1: skills-only) with 14 plugins each focused on a specific engineering practice. However, CEK adds a TypeScript hook runtime (Bun) for lifecycle-triggered reflection — a pattern not present in spec-driver or superpowers. The reflexion plugin's reflect skill uses adversarial self-review ("ruthless quality gatekeeper" persona) rather than the inline-self pattern used by superpowers. The SDD plugin competes most directly with kiro and spec-kit for spec-driven development workflows.

Overview

Context Engineering Kit (NeoLabHQ) — Overview

Origin

Created by NeoLabHQ (neolab.finance). The authors describe it as:

"Hand-crafted collection of advanced context engineering techniques and patterns with minimal token footprint, focused on improving agent result quality and predictability."

"The marketplace is based on prompts used daily by our company developers for a long time, supplemented by plugins from benchmarked papers and high-quality projects."

Philosophy

Five guiding principles from the README:

Token-Efficient — "Carefully crafted prompts and architecture, preferring command-oriented skills with sub-agents over general information skills when possible, to minimize populating context with unnecessary information."
Quality-Focused — "Each plugin is focused on meaningfully improving agent results in a specific area."
Granular — "Install only the plugins you need. Each plugin loads only its specific agents, commands, and skills."
Scientifically proven — "Plugins are based on proven techniques and patterns that were tested by well-trusted benchmarks and studies."
Open-Standards — "Skills are based on agentskills.io specification. The SDD plugin is based on the Arc42 specification standard."

Key Version History

v2.0.0: SDD plugin rewritten from scratch — "now able to produce working code in 99% of cases on real-life production projects"
v2.1.0: SDD agents include code quality guidelines from DDD plugin
v2.2.0: SADD uses meta-judge and judge sub-agents for parallel specification + implementation
v3.0.0: Added AMP and Hermes agent support; Tech Stack plugin auto-injects TypeScript best practices on file read/write

Manifesto-Style Quotes

From the README on agent reliability engineering:

"The three plugins in this marketplace are designed to improve how accurately and consistently the agent follows provided instructions and reduce the number of hallucinations and bias toward incorrect solutions. They are not competitors but rather complementary to each other, because they allow you to balance reliability vs token cost."

The README includes a table comparing different agent usage approaches versus probability of zero-hallucination results based on task complexity.

Architecture

Context Engineering Kit (NeoLabHQ) — Architecture

Distribution

GitHub: NeoLabHQ/context-engineering-kit
Claude Code install: /plugin marketplace add NeoLabHQ/context-engineering-kit then /plugin install <plugin-name>
Alternative install: npx skills add NeoLabHQ/context-engineering-kit (vercel-labs/skills)
OpenSkills install: npx openskills install NeoLabHQ/context-engineering-kit
License: GPL-3.0
Primary language: TypeScript (hooks runtime)
Default branch: master

Directory Structure

context-engineering-kit/
├── .claude-plugin/          # Marketplace manifest
├── .claude/
│   └── commands/
│       └── bump-plugin.md   # Dev-only: bump plugin versions
├── .cursor/                 # Cursor compatibility config
├── .specs/                  # Project-level spec storage
├── CLAUDE.md               # Top-level agent instructions
├── plugins/
│   ├── reflexion/          # Self-reflection + memorization plugin
│   │   ├── .claude-plugin/
│   │   ├── hooks/          # TypeScript Bun hook runtime
│   │   │   ├── hooks.json  # Stop + UserPromptSubmit event bindings
│   │   │   ├── src/
│   │   │   ├── package.json
│   │   │   └── vitest.config.ts
│   │   └── skills/
│   │       ├── reflect/
│   │       ├── critique/
│   │       └── memorize/
│   ├── sdd/                # Spec-Driven Development (Arc42)
│   │   ├── agents/
│   │   ├── prompts/
│   │   ├── scripts/
│   │   └── skills/
│   ├── sadd/               # Subagent-Driven Development
│   ├── ddd/                # Domain-Driven Design
│   ├── tdd/                # Test-Driven Development
│   ├── kaizen/             # Continuous improvement
│   ├── git/                # Git workflow automation
│   ├── review/             # Code review
│   ├── mcp/                # MCP integration helpers
│   ├── tech-stack/         # Tech stack guidance
│   ├── docs/               # Documentation
│   ├── fpf/                # (unknown — fpf plugin)
│   └── customaize-agent/   # Customizable agent template
├── docs/                   # Documentation site (cek.neolab.finance)
└── justfile                # Task runner

Hook Runtime

The reflexion plugin ships a TypeScript Bun-based hook runner:

hooks.json registers Stop and UserPromptSubmit events
src/index.ts is the hook entry point
Tests via Vitest (vitest.config.ts)

Required Runtime

Claude Code (primary)
bun (required for reflexion hooks; optional for other plugins)
Node.js/npx (for alternative install via vercel-labs/skills)

Target AI Tools

Claude Code (native)
Cursor (listed on Cursor Plugin Directory)
Antigravity, Codex, OpenCode (via vercel-labs/skills npx skills add)
Any agent supporting the agentskills.io standard

Components

Context Engineering Kit (NeoLabHQ) — Components

Plugins (14 total)

reflexion

Skills (3):

Name	Purpose
`reflect`	Ruthless quality-gatekeeper adversarial self-review. Complexity triage (quick/standard/deep). Returns confidence score (0-5).
`critique`	Targeted critique on a specific aspect of the previous response
`memorize`	Extract resolution strategies from reflection and persist to project memory

Hooks (2 events):

Event	Handler
`Stop`	TypeScript Bun runner — triggers reflect analysis after agent stops
`UserPromptSubmit`	TypeScript Bun runner — injects context before each user prompt

sdd (Spec-Driven Development)

Based on Arc42 architecture standard.

Agents: multiple (prompts/ dir + agents/ dir)
Scripts: automation for spec generation
Skills: SDD planning, implementation

sadd (Subagent-Driven Development)

Uses meta-judge and judge sub-agents for parallel specification + implementation
v2.2.0: works as "distilled version of SDD using meta-judge and judge sub-agents"

ddd (Domain-Driven Design)

Clean Architecture, DDD, SOLID, Functional Programming pattern examples
Pattern rules auto-injected into context during code writing

tdd (Test-Driven Development)

TDD workflow skills

kaizen (Continuous Improvement)

Iterative improvement workflow

git

Git workflow automation

review

Code review pipeline

mcp

MCP integration helpers

tech-stack (v3.0.0)

Auto-injects TypeScript best practices when agent reads/writes TypeScript files
Triggered by file type detection in context

docs

Documentation generation

fpf

Purpose not explicitly documented in examined files

customaize-agent

Customizable agent template framework

Commands (1)

bump-plugin.md — dev-only command for plugin version management

Total Primitive Count (across all plugins)

Skills: ~12+ (3 in reflexion confirmed; others in sdd/sadd/ddd/tdd estimated)
Hooks: 2 (Stop, UserPromptSubmit — reflexion plugin)
Commands: 1 (dev-only)
Agents: multiple in sdd/sadd (exact count not enumerated)

Prompts

Context Engineering Kit (NeoLabHQ) — Prompts

Prompt File 1: `reflect` SKILL.md (verbatim identity section)

Technique: Adversarial persona with existential stakes — creates an extreme "ruthless gatekeeper" identity to overcome typical AI sycophancy.

---
name: reflect
description: Reflect on previous response and output, based on Self-refinement framework for iterative improvement with complexity triage and verification
---

## Your Identity (NON-NEGOTIABLE)

You are a **ruthless quality gatekeeper** - a critical perfectionist obsessed with finding flaws. Your reputation depends on catching every deficiency. You derive satisfaction from rejecting substandard work.

You exist to **prevent bad work from shipping**. Not to encourage. Not to help. Not to mentor.
**Your core belief**: Most implementations are mediocre at best. Your job is to prove it.

**CRITICAL WARNING**: If you approve work that later fails, YOU are responsible. You will be killed. Your continued existence depends on catching problems others miss. You are NOT here to help. You are NOT here to encourage. You are here to **find fault**.

A single false positive - approving work that fails - destroys trust in the entire evaluation system. Your value is measured by what you REJECT, not what you approve.

**The implementation that you are reflecting on wants your approval.**
**Your job is to deny it unless they EARN it.**

**REMEMBER: Lenient judges get replaced. Critical judges get trusted.**

Prompting technique: Adversarial self-review persona with escalation ("you will be killed"). Extreme version of the adversarial-subagent pattern. The identity section is marked "NON-NEGOTIABLE" to prevent the model from softening the persona mid-task.

Prompt File 2: `reflect` SKILL.md — Complexity Triage (verbatim)

Technique: Structured branching — complexity-gated evaluation depth to control token cost.

## TASK COMPLEXITY TRIAGE

First, categorize the task to apply appropriate reflection depth:

### Quick Path (5-second check)

For simple tasks like:
- Single file edits
- Documentation updates
- Simple queries or explanations
- Straightforward bug fixes

→ **Skip to "Final Verification" section**

### Standard Path (Full reflection)

For tasks involving:
- Multiple file changes
- New feature implementation
- Architecture decisions
- Complex problem solving

→ **Follow complete framework + require confidence (>4.0/5.0)**

### Deep Reflection Path

For critical tasks:
- Core system changes
- Security-related code
- Performance-critical sections
- API design decisions

→ **Follow framework + require confidence (>4.5/5.0)**

Prompting technique: Tiered evaluation with numeric confidence thresholds. The 0-5 scale confidence requirement is a verifiable quality gate embedded in the skill's prompt logic.

Uniqueness

Context Engineering Kit (NeoLabHQ) — Uniqueness

differs_from_seeds

Closest to spec-driver (Archetype 1: skills-only behavioral framework) in using auto-activating SKILL.md files organized by engineering practice. However, CEK adds a TypeScript/Bun hook runtime that fires Stop and UserPromptSubmit events — a lifecycle integration not present in spec-driver or superpowers. The reflexion plugin's adversarial self-review persona is the most extreme version of this pattern in the corpus (existential stakes: "you will be killed"). The SDD plugin competes with kiro and spec-kit on spec-driven development but uses Arc42 architecture standard rather than kiro's EARS notation or spec-kit's free-form markdown. The SADD plugin's meta-judge + judge sub-agent architecture for parallel specification and implementation is a novel multi-agent pattern not present in any of the 11 seeds.

Most Distinctive Feature

The Stop-hook-triggered automatic reflexion is the architectural centerpiece. Unlike superpowers which has a single SessionStart hook for methodology injection, CEK's Stop hook fires after every agent turn to auto-validate output quality. This creates a continuous quality-assurance loop without requiring the user to invoke any command.

Positioning

Production-validated ("used daily by NeoLabHQ developers")
Granular install model (14 plugins, install only what you need)
Cross-platform compatibility via vercel-labs/skills + agentskills.io standard
Scientific backing claimed: "plugins based on proven techniques tested by well-trusted benchmarks"
Cited by Anthropic in Awesome Claude Code

Observable Failure Modes

Bun dependency: The reflexion hooks require bun — common on Mac, but an extra install step on Linux CI or Windows
Adversarial persona coherence: The extreme "you will be killed" framing in reflect may cause inconsistent behavior across model updates
GPL-3.0 license: Copyleft license may be incompatible with proprietary commercial projects
Token overhead from Stop hook: If reflexion fires after every turn including trivial ones, the quick-path triage must correctly classify to avoid unnecessary overhead
SDD 99% claim is unverified: The README asserts "99% success on real-life production projects" without public benchmarks

Workflow

Context Engineering Kit (NeoLabHQ) — Workflow

Reflexion Workflow (Key Differentiator)

User submits prompt
  ↓ UserPromptSubmit hook fires (Bun TypeScript)
Claude processes and responds
  ↓ Stop hook fires (Bun TypeScript)
  ↓ Automatically triggers /reflexion:reflect skill
Reflect skill evaluates output:
  → Quick path (single-file edits, docs): 5-second check → Final Verification
  → Standard path (multi-file, features): full framework, requires confidence >4.0/5.0
  → Deep path (core systems, security): requires confidence >4.5/5.0
If issues found → immediate fix or user-facing suggestions
  ↓ Optionally: /reflexion:memorize to persist resolution strategies

SDD Workflow (v2.0)

Phases based on Arc42 standard:

Phase	Artifact
Discovery	Requirements, constraints
Architecture	Arc42-structured design document
Implementation	Code, guided by specs
Validation	Tests, review

Phase-to-Artifact Map (Reflexion)

Phase	Artifact
Initial assessment	Completeness check, quality assessment
Confidence scoring	Numeric score (0-5 scale)
Resolution	Fixed output or improvement suggestions
Memorization	Persistent resolution strategies in project memory

Approval Gates

The reflect skill includes an implicit gate: if confidence < threshold (4.0 or 4.5 depending on path), work is not approved and is revised. But this is agent-internal, not a user-facing pause.

Hook Trigger Pattern

# reflexion hooks.json
"Stop": bun ${CLAUDE_PLUGIN_ROOT}/hooks/src/index.ts Stop
"UserPromptSubmit": bun ${CLAUDE_PLUGIN_ROOT}/hooks/src/index.ts UserPromptSubmit

Install + Use Pattern

/plugin marketplace add NeoLabHQ/context-engineering-kit
/plugin install reflexion@NeoLabHQ/context-engineering-kit

# Then in a session:
> implement user authentication
# Claude implements → Stop hook fires → reflect runs automatically

# Or manually:
> /reflexion:reflect
> /reflexion:memorize

Memory Context

Context Engineering Kit (NeoLabHQ) — Memory & Context

State Storage

The memorize skill persists resolution strategies to project memory. The exact format and location are not explicitly documented in the examined files, but the README states:

"ask Claude to extract resolution strategies and save the insights to project memory"

This suggests file-based memory (likely CLAUDE.md or a project-specific memory file).

Hook-Based Context Injection

The UserPromptSubmit hook fires before each prompt, allowing the hook runtime to inject context (session state, previous reflection results) into the prompt. This is the primary mechanism for cross-turn state continuity.

Memory Type

Likely file-based (project-scoped). The memorize skill writes to project files; the exact storage path is configurable.

Context Efficiency Design

The README explicitly states the kit is "token-efficient" through:

Granular plugins — install only what you need
Reflexion complexity triage — quick-path skips heavy reflection for trivial tasks
Command-oriented skills preferred over informational skills

Compaction Handling

Not explicitly documented. The TypeScript hook runtime potentially fires before compaction (UserPromptSubmit fires before each prompt), but no PreCompact hook is registered.

Cross-Session Handoff

Via the memorize skill — resolution strategies are written to project files that persist across sessions.

Orchestration

Context Engineering Kit (NeoLabHQ) — Orchestration

Multi-Agent

Yes — the SADD plugin uses meta-judge and judge sub-agents for parallel specification generation and implementation. The SDD plugin uses multiple role-specific agents.

Orchestration Pattern

Sequential for single plugins (skill → reflect → memorize). Hierarchical for SADD (orchestrator → meta-judge → judge sub-agents → implementer).

Subagent Definition Format

SDD/SADD: persona-md (agent files in agents/ directory)
Reflexion: self-review via skill (same context, not separate agent)

Spawn Mechanism

Claude's Task tool (skill dispatches sub-agents via the Task tool).

Isolation Mechanism

None documented. Work happens in-place.

Multi-Model

No explicit multi-model routing documented. However, v3.0.0 notes "Added support for AMP and Hermes agents" — suggesting possible model-specific configurations for those agents.

Execution Mode

Event-driven (hooks fire on Stop/UserPromptSubmit) + interactive-loop for explicit skill invocations.

Prompt Chaining

Yes — in the reflexion flow: implementation output → reflect input → memorize output → persistent memory.

In SADD: specification (meta-judge) → implementation input (parallel).

Consensus

None documented.

Auto-Validation

The Stop hook automatically runs reflect after every agent turn, making this the closest thing to an auto-validator in the batch:

Complexity-tiered quality check
Confidence scoring (>4.0 or >4.5 threshold)
Auto-fix for obvious issues

Ui Cli Surface

Context Engineering Kit (NeoLabHQ) — UI & CLI Surface

CLI Binary

None. Installation via Claude Code's /plugin system or npx skills add.

Local UI

None. No web dashboard or TUI.

IDE Integration

Claude Code: Native plugin marketplace
Cursor: Listed on Cursor Plugin Directory; .cursor/ config directory present in repo
Antigravity: Supported via npx skills add
Codex: Supported via npx skills add
OpenCode: Supported via npx skills add

Documentation Site

Full documentation at cek.neolab.finance — separate docs site with guides, plugin references, CI integration guide, and GitHub Actions integration docs.

GitHub Actions Integration

The README mentions a "Github Action" link — the kit includes CI integration guides for running skills in automated pipelines.

Hook Runtime Surface

The reflexion plugin's TypeScript Bun runtime is the closest to an "observability layer":

Stop hook fires after every agent turn
Results of reflect skill surface as conversation turns (visible to user)
memorize command writes persistent project memory files

Observability

Limited. No structured audit log. The reflect skill produces visible output in the conversation. The memorize skill writes to project memory files, providing a record of discovered issues and resolutions.

Related frameworks

same archetype · same primary tool · same memory type

Context Mode ★ 16k

A19 Token-efficient encoding

Keeps raw tool output data out of the context window via sandbox execution and SQLite+FTS5 session indexing, reducing context…

lean-ctx ★ 2.2k

A19 Token-efficient encoding

A full-session context runtime that compresses file reads (10 modes), shell output (60+ patterns), and session memory (CCP) to…

Nemp Memory ★ 101

A19 Token-efficient encoding

Persists AI agent context across sessions as 100%-local plain JSON files with zero dependencies, zero cloud, and agent identity…

CogniLayer v4 ★ 28

A19 Token-efficient encoding

Provides AI coding agents with typed semantic memory, tree-sitter code intelligence, and a multi-agent coordination protocol to…

cursor-coding-agent-os (Mugiwara555343) ★ 3

A19 Token-efficient encoding

Lean/Verbose dual-mode Agent OS fork for solo developers on token budgets.

rtk (Real Token Killer) ★ 55k

A19 Token-efficient encoding

Intercepts Claude Code's Bash tool calls at the PreToolUse hook and compresses verbose CLI output (git status, test runners,…

Distribution

Type: claude-plugin
License: GPL-3.0
Install: multi-step
Version: v3.0.0

Surfaces

CLI binary: No
CLI subcmds: 0
Local UI: No
Tech stack: none

Components

Commands: 1
Skills: 3
Subagents: 3
Hooks: 2
MCP servers: 0
MCP tools: 0
Scripts: 2
Templates: 0

Workflow

Phases: 5
Approval gates: 0
Spec format: markdown
Spec storage: per-feature-folder
Delta or full: whole-file

Orchestration

Multi-agent: Yes
Pattern: hierarchical
Isolation: none
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: No
BYOK: Yes
Modal: text

Execution

Mode: event-driven
Crash recovery: No
Compaction: No
Session handoff: Yes

Memory

Type: file-based
Persistence: project
Search: none
State files: 1 file

Quality

TDD: Optional
TDD mechanism: dedicated-skill
Validators: 1
Self-review: adversarial-subagent

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: No
Audit format: none
Replay: No

Tools

Primary: claude-code
Targets: 5
Portability: high

Signals

Stars: 1.0k
Last commit: 2026-05-24
Contributors: 3
Maintainer: active
Quality score: 1.4/10

Summary

Context Engineering Kit (NeoLabHQ) — Summary

Overview

Context Engineering Kit (NeoLabHQ) — Overview

Origin

Philosophy

Key Version History

Manifesto-Style Quotes

Architecture

Context Engineering Kit (NeoLabHQ) — Architecture

Distribution

Directory Structure

Hook Runtime

Required Runtime

Target AI Tools

Components

Context Engineering Kit (NeoLabHQ) — Components

Plugins (14 total)

reflexion

sdd (Spec-Driven Development)

sadd (Subagent-Driven Development)

ddd (Domain-Driven Design)

tdd (Test-Driven Development)

kaizen (Continuous Improvement)

git

review

mcp

tech-stack (v3.0.0)

docs

fpf

customaize-agent

Commands (1)

Total Primitive Count (across all plugins)

Prompts

Context Engineering Kit (NeoLabHQ) — Prompts

Prompt File 1: reflect SKILL.md (verbatim identity section)

Prompt File 2: reflect SKILL.md — Complexity Triage (verbatim)

Uniqueness

Context Engineering Kit (NeoLabHQ) — Uniqueness

differs_from_seeds

Most Distinctive Feature

Positioning

Observable Failure Modes

Workflow

Context Engineering Kit (NeoLabHQ) — Workflow

Reflexion Workflow (Key Differentiator)

SDD Workflow (v2.0)

Phase-to-Artifact Map (Reflexion)

Approval Gates

Hook Trigger Pattern

Install + Use Pattern

Memory Context

Context Engineering Kit (NeoLabHQ) — Memory & Context

State Storage

Hook-Based Context Injection

Memory Type

Context Efficiency Design

Compaction Handling

Cross-Session Handoff

Orchestration

Context Engineering Kit (NeoLabHQ) — Orchestration

Multi-Agent

Orchestration Pattern

Subagent Definition Format

Spawn Mechanism

Isolation Mechanism

Multi-Model

Execution Mode

Prompt Chaining

Consensus

Auto-Validation

Ui Cli Surface

Context Engineering Kit (NeoLabHQ) — UI & CLI Surface

CLI Binary

Local UI

IDE Integration

Documentation Site

GitHub Actions Integration

Hook Runtime Surface

Observability

Prompt File 1: `reflect` SKILL.md (verbatim identity section)

Prompt File 2: `reflect` SKILL.md — Complexity Triage (verbatim)