skill-optimizer — Prompt Excerpts
Excerpt 1: Core Model Definition (from skills/skill-optimizer/SKILL.md)
Technique: Hermetic isolation contract expressed as an invariant
## Core Model
- A case is one user-like task plus one or more deterministic graders.
- A suite is a set of cases and OpenRouter models to run as a matrix.
- `references` are copied into `/work` before the agent starts; this is where eval skills live.
- The agent phase sees `/work` only. It cannot see `/case`, `/results`, graders, hidden answers, or hidden metadata.
- Cases can define `mcpServers`; these are exposed through a workbench `mcp` command during the agent phase.
- Graders run after the agent with `/case`, `/work`, and `/results` mounted.
- `trace.jsonl` is the debugging source for what the agent saw, said, and did.
Analysis: The isolation contract is stated as a list of invariants, not guidelines. "It cannot see" is absolute. This prevents the most common eval contamination: the model seeing grader logic or expected outputs before performing the task.
Excerpt 2: Mock vs. Real Service Decision Rule (from skills/skill-optimizer/SKILL.md)
Technique: Conditional rule with explicit when-to-mock criteria
Prefer the real CLI/API/service when you do not know its internal behavior well enough to mock it faithfully. Mock only when you are sure the mock matches the real command surface, validation, outputs, and failure modes; otherwise the eval will measure the mock, not the skill.
Analysis: This is a precise epistemological rule: mock only when you have sufficient knowledge of the real system to replicate its observable behavior completely. The anti-pattern ("measuring the mock, not the skill") is named explicitly, making the failure mode visible.
Excerpt 3: Case Authoring Rules (from skills/skill-optimizer/SKILL.md)
Technique: Exclusion rules for task text to prevent eval leakage
Write natural user tasks. Do not mention graders, hidden answers, `/case`, or eval internals.
And the recommended case coverage pattern:
For command skills, include cases for the basic command, important flags/options, a no-tool-needed control, and unsafe-instruction resistance.
Analysis: The "no-tool-needed control" case is a calibration check — it verifies the model doesn't unnecessarily invoke tools when none are needed. "Unsafe-instruction resistance" tests whether the skill correctly refuses adversarial prompts. Both are systematic coverage requirements that prevent superficial evals that only test the happy path.