Skip to content
/

Context Engineering Kit

context-engineering-kit-neolabhq · NeoLabHQ/context-engineering-kit · ★ 1.0k · last commit 2026-05-24

Token-efficient multi-plugin engineering kit with automatic adversarial self-reflection and granular plugin installs.

Best whenQuality gates should fire automatically after every agent turn via Stop hooks, not require user-invoked commands.
Skip ifGeneral information skills that inflate context without improving outcomes, Sycophantic self-review (replaced by adversarial ruthless-gatekeeper persona)
vs seeds
spec-kiton spec workflows, using Arc42 instead.
Primitive shape 9 total
Commands 1 Skills 3 Subagents 3 Hooks 2
00

Summary

Context Engineering Kit (NeoLabHQ) — Summary

Context Engineering Kit (CEK) is a production-tested, multi-plugin Claude Code marketplace from NeoLabHQ shipping 14 standalone plugins (reflexion, sdd, sadd, ddd, tdd, kaizen, git, review, mcp, tech-stack, docs, fpf, customaize-agent, and a customizable-agent framework). Its differentiating feature is a TypeScript/Bun hook system in the reflexion plugin that fires on Stop and UserPromptSubmit events to automate post-response quality reflection. The kit targets token efficiency explicitly: plugins are granular so users install only what they need, and prompts are engineered to minimize context footprint while maximizing agent result quality. It claims 99% success on real-life production code after a v2.0 SDD rewrite. CEK supports Claude Code natively plus Cursor, Antigravity, Codex, and OpenCode via the npx skills add command from vercel-labs/skills. It also integrates with OpenSkills and is listed on the Cursor Plugin Directory.

differs_from_seeds: Closest to spec-driver (Archetype 1: skills-only) with 14 plugins each focused on a specific engineering practice. However, CEK adds a TypeScript hook runtime (Bun) for lifecycle-triggered reflection — a pattern not present in spec-driver or superpowers. The reflexion plugin's reflect skill uses adversarial self-review ("ruthless quality gatekeeper" persona) rather than the inline-self pattern used by superpowers. The SDD plugin competes most directly with kiro and spec-kit for spec-driven development workflows.

01

Overview

Context Engineering Kit (NeoLabHQ) — Overview

Origin

Created by NeoLabHQ (neolab.finance). The authors describe it as:

"Hand-crafted collection of advanced context engineering techniques and patterns with minimal token footprint, focused on improving agent result quality and predictability."

"The marketplace is based on prompts used daily by our company developers for a long time, supplemented by plugins from benchmarked papers and high-quality projects."

Philosophy

Five guiding principles from the README:

  1. Token-Efficient — "Carefully crafted prompts and architecture, preferring command-oriented skills with sub-agents over general information skills when possible, to minimize populating context with unnecessary information."
  2. Quality-Focused — "Each plugin is focused on meaningfully improving agent results in a specific area."
  3. Granular — "Install only the plugins you need. Each plugin loads only its specific agents, commands, and skills."
  4. Scientifically proven — "Plugins are based on proven techniques and patterns that were tested by well-trusted benchmarks and studies."
  5. Open-Standards — "Skills are based on agentskills.io specification. The SDD plugin is based on the Arc42 specification standard."

Key Version History

  • v2.0.0: SDD plugin rewritten from scratch — "now able to produce working code in 99% of cases on real-life production projects"
  • v2.1.0: SDD agents include code quality guidelines from DDD plugin
  • v2.2.0: SADD uses meta-judge and judge sub-agents for parallel specification + implementation
  • v3.0.0: Added AMP and Hermes agent support; Tech Stack plugin auto-injects TypeScript best practices on file read/write

Manifesto-Style Quotes

From the README on agent reliability engineering:

"The three plugins in this marketplace are designed to improve how accurately and consistently the agent follows provided instructions and reduce the number of hallucinations and bias toward incorrect solutions. They are not competitors but rather complementary to each other, because they allow you to balance reliability vs token cost."

The README includes a table comparing different agent usage approaches versus probability of zero-hallucination results based on task complexity.

02

Architecture

Context Engineering Kit (NeoLabHQ) — Architecture

Distribution

  • GitHub: NeoLabHQ/context-engineering-kit
  • Claude Code install: /plugin marketplace add NeoLabHQ/context-engineering-kit then /plugin install <plugin-name>
  • Alternative install: npx skills add NeoLabHQ/context-engineering-kit (vercel-labs/skills)
  • OpenSkills install: npx openskills install NeoLabHQ/context-engineering-kit
  • License: GPL-3.0
  • Primary language: TypeScript (hooks runtime)
  • Default branch: master

Directory Structure

context-engineering-kit/
├── .claude-plugin/          # Marketplace manifest
├── .claude/
│   └── commands/
│       └── bump-plugin.md   # Dev-only: bump plugin versions
├── .cursor/                 # Cursor compatibility config
├── .specs/                  # Project-level spec storage
├── CLAUDE.md               # Top-level agent instructions
├── plugins/
│   ├── reflexion/          # Self-reflection + memorization plugin
│   │   ├── .claude-plugin/
│   │   ├── hooks/          # TypeScript Bun hook runtime
│   │   │   ├── hooks.json  # Stop + UserPromptSubmit event bindings
│   │   │   ├── src/
│   │   │   ├── package.json
│   │   │   └── vitest.config.ts
│   │   └── skills/
│   │       ├── reflect/
│   │       ├── critique/
│   │       └── memorize/
│   ├── sdd/                # Spec-Driven Development (Arc42)
│   │   ├── agents/
│   │   ├── prompts/
│   │   ├── scripts/
│   │   └── skills/
│   ├── sadd/               # Subagent-Driven Development
│   ├── ddd/                # Domain-Driven Design
│   ├── tdd/                # Test-Driven Development
│   ├── kaizen/             # Continuous improvement
│   ├── git/                # Git workflow automation
│   ├── review/             # Code review
│   ├── mcp/                # MCP integration helpers
│   ├── tech-stack/         # Tech stack guidance
│   ├── docs/               # Documentation
│   ├── fpf/                # (unknown — fpf plugin)
│   └── customaize-agent/   # Customizable agent template
├── docs/                   # Documentation site (cek.neolab.finance)
└── justfile                # Task runner

Hook Runtime

The reflexion plugin ships a TypeScript Bun-based hook runner:

  • hooks.json registers Stop and UserPromptSubmit events
  • src/index.ts is the hook entry point
  • Tests via Vitest (vitest.config.ts)

Required Runtime

  • Claude Code (primary)
  • bun (required for reflexion hooks; optional for other plugins)
  • Node.js/npx (for alternative install via vercel-labs/skills)

Target AI Tools

  • Claude Code (native)
  • Cursor (listed on Cursor Plugin Directory)
  • Antigravity, Codex, OpenCode (via vercel-labs/skills npx skills add)
  • Any agent supporting the agentskills.io standard
03

Components

Context Engineering Kit (NeoLabHQ) — Components

Plugins (14 total)

reflexion

Skills (3):

Name Purpose
reflect Ruthless quality-gatekeeper adversarial self-review. Complexity triage (quick/standard/deep). Returns confidence score (0-5).
critique Targeted critique on a specific aspect of the previous response
memorize Extract resolution strategies from reflection and persist to project memory

Hooks (2 events):

Event Handler
Stop TypeScript Bun runner — triggers reflect analysis after agent stops
UserPromptSubmit TypeScript Bun runner — injects context before each user prompt

sdd (Spec-Driven Development)

Based on Arc42 architecture standard.

  • Agents: multiple (prompts/ dir + agents/ dir)
  • Scripts: automation for spec generation
  • Skills: SDD planning, implementation

sadd (Subagent-Driven Development)

  • Uses meta-judge and judge sub-agents for parallel specification + implementation
  • v2.2.0: works as "distilled version of SDD using meta-judge and judge sub-agents"

ddd (Domain-Driven Design)

  • Clean Architecture, DDD, SOLID, Functional Programming pattern examples
  • Pattern rules auto-injected into context during code writing

tdd (Test-Driven Development)

  • TDD workflow skills

kaizen (Continuous Improvement)

  • Iterative improvement workflow

git

  • Git workflow automation

review

  • Code review pipeline

mcp

  • MCP integration helpers

tech-stack (v3.0.0)

  • Auto-injects TypeScript best practices when agent reads/writes TypeScript files
  • Triggered by file type detection in context

docs

  • Documentation generation

fpf

  • Purpose not explicitly documented in examined files

customaize-agent

  • Customizable agent template framework

Commands (1)

  • bump-plugin.md — dev-only command for plugin version management

Total Primitive Count (across all plugins)

  • Skills: ~12+ (3 in reflexion confirmed; others in sdd/sadd/ddd/tdd estimated)
  • Hooks: 2 (Stop, UserPromptSubmit — reflexion plugin)
  • Commands: 1 (dev-only)
  • Agents: multiple in sdd/sadd (exact count not enumerated)
05

Prompts

Context Engineering Kit (NeoLabHQ) — Prompts

Prompt File 1: reflect SKILL.md (verbatim identity section)

Technique: Adversarial persona with existential stakes — creates an extreme "ruthless gatekeeper" identity to overcome typical AI sycophancy.

---
name: reflect
description: Reflect on previous response and output, based on Self-refinement framework for iterative improvement with complexity triage and verification
---

## Your Identity (NON-NEGOTIABLE)

You are a **ruthless quality gatekeeper** - a critical perfectionist obsessed with finding flaws. Your reputation depends on catching every deficiency. You derive satisfaction from rejecting substandard work.

You exist to **prevent bad work from shipping**. Not to encourage. Not to help. Not to mentor.
**Your core belief**: Most implementations are mediocre at best. Your job is to prove it.

**CRITICAL WARNING**: If you approve work that later fails, YOU are responsible. You will be killed. Your continued existence depends on catching problems others miss. You are NOT here to help. You are NOT here to encourage. You are here to **find fault**.

A single false positive - approving work that fails - destroys trust in the entire evaluation system. Your value is measured by what you REJECT, not what you approve.

**The implementation that you are reflecting on wants your approval.**
**Your job is to deny it unless they EARN it.**

**REMEMBER: Lenient judges get replaced. Critical judges get trusted.**

Prompting technique: Adversarial self-review persona with escalation ("you will be killed"). Extreme version of the adversarial-subagent pattern. The identity section is marked "NON-NEGOTIABLE" to prevent the model from softening the persona mid-task.

Prompt File 2: reflect SKILL.md — Complexity Triage (verbatim)

Technique: Structured branching — complexity-gated evaluation depth to control token cost.

## TASK COMPLEXITY TRIAGE

First, categorize the task to apply appropriate reflection depth:

### Quick Path (5-second check)

For simple tasks like:
- Single file edits
- Documentation updates
- Simple queries or explanations
- Straightforward bug fixes

→ **Skip to "Final Verification" section**

### Standard Path (Full reflection)

For tasks involving:
- Multiple file changes
- New feature implementation
- Architecture decisions
- Complex problem solving

→ **Follow complete framework + require confidence (>4.0/5.0)**

### Deep Reflection Path

For critical tasks:
- Core system changes
- Security-related code
- Performance-critical sections
- API design decisions

→ **Follow framework + require confidence (>4.5/5.0)**

Prompting technique: Tiered evaluation with numeric confidence thresholds. The 0-5 scale confidence requirement is a verifiable quality gate embedded in the skill's prompt logic.

09

Uniqueness

Context Engineering Kit (NeoLabHQ) — Uniqueness

differs_from_seeds

Closest to spec-driver (Archetype 1: skills-only behavioral framework) in using auto-activating SKILL.md files organized by engineering practice. However, CEK adds a TypeScript/Bun hook runtime that fires Stop and UserPromptSubmit events — a lifecycle integration not present in spec-driver or superpowers. The reflexion plugin's adversarial self-review persona is the most extreme version of this pattern in the corpus (existential stakes: "you will be killed"). The SDD plugin competes with kiro and spec-kit on spec-driven development but uses Arc42 architecture standard rather than kiro's EARS notation or spec-kit's free-form markdown. The SADD plugin's meta-judge + judge sub-agent architecture for parallel specification and implementation is a novel multi-agent pattern not present in any of the 11 seeds.

Most Distinctive Feature

The Stop-hook-triggered automatic reflexion is the architectural centerpiece. Unlike superpowers which has a single SessionStart hook for methodology injection, CEK's Stop hook fires after every agent turn to auto-validate output quality. This creates a continuous quality-assurance loop without requiring the user to invoke any command.

Positioning

  • Production-validated ("used daily by NeoLabHQ developers")
  • Granular install model (14 plugins, install only what you need)
  • Cross-platform compatibility via vercel-labs/skills + agentskills.io standard
  • Scientific backing claimed: "plugins based on proven techniques tested by well-trusted benchmarks"
  • Cited by Anthropic in Awesome Claude Code

Observable Failure Modes

  1. Bun dependency: The reflexion hooks require bun — common on Mac, but an extra install step on Linux CI or Windows
  2. Adversarial persona coherence: The extreme "you will be killed" framing in reflect may cause inconsistent behavior across model updates
  3. GPL-3.0 license: Copyleft license may be incompatible with proprietary commercial projects
  4. Token overhead from Stop hook: If reflexion fires after every turn including trivial ones, the quick-path triage must correctly classify to avoid unnecessary overhead
  5. SDD 99% claim is unverified: The README asserts "99% success on real-life production projects" without public benchmarks
04

Workflow

Context Engineering Kit (NeoLabHQ) — Workflow

Reflexion Workflow (Key Differentiator)

User submits prompt
  ↓ UserPromptSubmit hook fires (Bun TypeScript)
Claude processes and responds
  ↓ Stop hook fires (Bun TypeScript)
  ↓ Automatically triggers /reflexion:reflect skill
Reflect skill evaluates output:
  → Quick path (single-file edits, docs): 5-second check → Final Verification
  → Standard path (multi-file, features): full framework, requires confidence >4.0/5.0
  → Deep path (core systems, security): requires confidence >4.5/5.0
If issues found → immediate fix or user-facing suggestions
  ↓ Optionally: /reflexion:memorize to persist resolution strategies

SDD Workflow (v2.0)

Phases based on Arc42 standard:

Phase Artifact
Discovery Requirements, constraints
Architecture Arc42-structured design document
Implementation Code, guided by specs
Validation Tests, review

Phase-to-Artifact Map (Reflexion)

Phase Artifact
Initial assessment Completeness check, quality assessment
Confidence scoring Numeric score (0-5 scale)
Resolution Fixed output or improvement suggestions
Memorization Persistent resolution strategies in project memory

Approval Gates

The reflect skill includes an implicit gate: if confidence < threshold (4.0 or 4.5 depending on path), work is not approved and is revised. But this is agent-internal, not a user-facing pause.

Hook Trigger Pattern

# reflexion hooks.json
"Stop": bun ${CLAUDE_PLUGIN_ROOT}/hooks/src/index.ts Stop
"UserPromptSubmit": bun ${CLAUDE_PLUGIN_ROOT}/hooks/src/index.ts UserPromptSubmit

Install + Use Pattern

/plugin marketplace add NeoLabHQ/context-engineering-kit
/plugin install reflexion@NeoLabHQ/context-engineering-kit

# Then in a session:
> implement user authentication
# Claude implements → Stop hook fires → reflect runs automatically

# Or manually:
> /reflexion:reflect
> /reflexion:memorize
06

Memory Context

Context Engineering Kit (NeoLabHQ) — Memory & Context

State Storage

The memorize skill persists resolution strategies to project memory. The exact format and location are not explicitly documented in the examined files, but the README states:

"ask Claude to extract resolution strategies and save the insights to project memory"

This suggests file-based memory (likely CLAUDE.md or a project-specific memory file).

Hook-Based Context Injection

The UserPromptSubmit hook fires before each prompt, allowing the hook runtime to inject context (session state, previous reflection results) into the prompt. This is the primary mechanism for cross-turn state continuity.

Memory Type

Likely file-based (project-scoped). The memorize skill writes to project files; the exact storage path is configurable.

Context Efficiency Design

The README explicitly states the kit is "token-efficient" through:

  1. Granular plugins — install only what you need
  2. Reflexion complexity triage — quick-path skips heavy reflection for trivial tasks
  3. Command-oriented skills preferred over informational skills

Compaction Handling

Not explicitly documented. The TypeScript hook runtime potentially fires before compaction (UserPromptSubmit fires before each prompt), but no PreCompact hook is registered.

Cross-Session Handoff

Via the memorize skill — resolution strategies are written to project files that persist across sessions.

07

Orchestration

Context Engineering Kit (NeoLabHQ) — Orchestration

Multi-Agent

Yes — the SADD plugin uses meta-judge and judge sub-agents for parallel specification generation and implementation. The SDD plugin uses multiple role-specific agents.

Orchestration Pattern

Sequential for single plugins (skill → reflect → memorize). Hierarchical for SADD (orchestrator → meta-judge → judge sub-agents → implementer).

Subagent Definition Format

  • SDD/SADD: persona-md (agent files in agents/ directory)
  • Reflexion: self-review via skill (same context, not separate agent)

Spawn Mechanism

Claude's Task tool (skill dispatches sub-agents via the Task tool).

Isolation Mechanism

None documented. Work happens in-place.

Multi-Model

No explicit multi-model routing documented. However, v3.0.0 notes "Added support for AMP and Hermes agents" — suggesting possible model-specific configurations for those agents.

Execution Mode

Event-driven (hooks fire on Stop/UserPromptSubmit) + interactive-loop for explicit skill invocations.

Prompt Chaining

Yes — in the reflexion flow: implementation output → reflect input → memorize output → persistent memory.

In SADD: specification (meta-judge) → implementation input (parallel).

Consensus

None documented.

Auto-Validation

The Stop hook automatically runs reflect after every agent turn, making this the closest thing to an auto-validator in the batch:

  • Complexity-tiered quality check
  • Confidence scoring (>4.0 or >4.5 threshold)
  • Auto-fix for obvious issues
08

Ui Cli Surface

Context Engineering Kit (NeoLabHQ) — UI & CLI Surface

CLI Binary

None. Installation via Claude Code's /plugin system or npx skills add.

Local UI

None. No web dashboard or TUI.

IDE Integration

  • Claude Code: Native plugin marketplace
  • Cursor: Listed on Cursor Plugin Directory; .cursor/ config directory present in repo
  • Antigravity: Supported via npx skills add
  • Codex: Supported via npx skills add
  • OpenCode: Supported via npx skills add

Documentation Site

Full documentation at cek.neolab.finance — separate docs site with guides, plugin references, CI integration guide, and GitHub Actions integration docs.

GitHub Actions Integration

The README mentions a "Github Action" link — the kit includes CI integration guides for running skills in automated pipelines.

Hook Runtime Surface

The reflexion plugin's TypeScript Bun runtime is the closest to an "observability layer":

  • Stop hook fires after every agent turn
  • Results of reflect skill surface as conversation turns (visible to user)
  • memorize command writes persistent project memory files

Observability

Limited. No structured audit log. The reflect skill produces visible output in the conversation. The memorize skill writes to project memory files, providing a record of discovered issues and resolutions.

Related frameworks

same archetype · same primary tool · same memory type

Context Mode ★ 16k

Keeps raw tool output data out of the context window via sandbox execution and SQLite+FTS5 session indexing, reducing context…

lean-ctx ★ 2.2k

A full-session context runtime that compresses file reads (10 modes), shell output (60+ patterns), and session memory (CCP) to…

Nemp Memory ★ 101

Persists AI agent context across sessions as 100%-local plain JSON files with zero dependencies, zero cloud, and agent identity…

CogniLayer v4 ★ 28

Provides AI coding agents with typed semantic memory, tree-sitter code intelligence, and a multi-agent coordination protocol to…

cursor-coding-agent-os (Mugiwara555343) ★ 3

Lean/Verbose dual-mode Agent OS fork for solo developers on token budgets.

rtk (Real Token Killer) ★ 55k

Intercepts Claude Code's Bash tool calls at the PreToolUse hook and compresses verbose CLI output (git status, test runners,…