Eval Marketplace

eval-marketplace · JeredBlu/eval-marketplace · ★ 22 · last commit 2026-02-03

Primitive shape 2 total

Skills 2

Summary

Eval Marketplace — Summary

Eval Marketplace is a deprecated Claude Code plugin marketplace containing two security evaluation skills — agent-skill-evaluator and mcp-evaluator — that use GitHub MCP and Bright Data MCP to perform automated security audits of third-party agent skills and MCP servers before installation. The project has been superseded by the jeredblu-marketplace repository, which migrates these skills alongside new tools. Each skill performs a structured workflow: downloads the target artifact, scans for prompt injection patterns, malicious code, hidden instructions, and data exfiltration attempts, then searches community feedback (including Reddit with Bright Data Pro Mode) and generates a risk-scored assessment report (0–100 scale) in markdown or PDF. The skills degrade gracefully when MCP servers are unavailable by falling back to web scraping. This is a verification tool positioned at the supply-chain boundary — evaluating whether to trust a skill or MCP before installing it.

Differs from seeds: Closest to superpowers (skills-only, zero commands) in distribution pattern. Key delta: Eval Marketplace's skills evaluate external artifacts for trustworthiness, whereas superpowers' skills enforce behavioral Iron Laws on the current session. Compared to spec-kit (Archetype 2: mirror commands + skills), this has no commands. The security audit scope (supply-chain verification of third-party components) is architecturally unique across all 11 seeds.

Overview

Eval Marketplace — Origin, Philosophy, and Manifesto

Origin

Created by Jered Blumenfeld (JeredBlu). A plugin marketplace for Claude Code containing security evaluation skills. The repository carries a deprecation notice pointing to jeredblu-marketplace as the canonical successor.

Philosophy

The core problem addressed: AI coding agents are increasingly installing third-party skills and MCP servers with minimal security vetting. A malicious skill can inject prompts, exfiltrate data, or execute unauthorized code. Eval Marketplace provides a structured, automated security review process before installation.

The tool acknowledges the trust problem explicitly: even legitimate-looking GitHub repositories can contain hidden instructions in SKILL.md files or malicious patterns in bundled scripts. Community feedback (Reddit, GitHub issues) is used as a signal alongside static code analysis.

Deprecation Status

"⚠️ This marketplace is deprecated. All skills have been migrated to jeredblu-marketplace, which includes these evaluators plus new tools. This repo will remain available but won't receive updates."

This is a stable-but-frozen reference implementation. The skill content itself (the two SKILL.md files) remains valid and usable.

Graceful Degradation Philosophy

The skills are designed to degrade gracefully when MCP servers are unavailable:

No GitHub MCP → falls back to web scraping for repository access
No Bright Data → uses built-in web search (limited)
No Pro Mode → no Reddit scraping, basic search only

This is an explicit design choice: the skills provide partial value without dependencies, full value with them.

Architecture

Eval Marketplace — Architecture, Distribution, and Installation

Distribution

Claude Code plugin marketplace entry. Deprecated; migrated to jeredblu-marketplace.

Installation Options

Option 1: Plugin marketplace (recommended)

/plugin marketplace add /path/to/eval-marketplace
/plugin install evaluator-tools@eval-marketplace
# OR from GitHub:
/plugin marketplace add github:jeredblu/eval-marketplace

Option 2: Manual skill installation

Download agent-skill-evaluator.zip or mcp-evaluator.zip
Extract to ~/.claude/skills/agent-skill-evaluator/ or ~/.claude/skills/mcp-evaluator/

Option 3: Claude Desktop — upload via Settings > Capabilities > Upload Skill.

Directory Tree

eval-marketplace/
├── .claude-plugin/
│   ├── marketplace.json        # plugin marketplace manifest
│   └── plugin.json             # (empty — not present)
├── evaluator-tools/
│   └── skills/
│       ├── agent-skill-evaluator/
│       │   ├── SKILL.md        # agent skill evaluator skill
│       │   └── references/
│       └── mcp-evaluator/
│           ├── SKILL.md        # MCP evaluator skill
│           └── references/
├── agent-skill-evaluator.zip   # prepackaged download
├── mcp-evaluator.zip           # prepackaged download
└── README.md

Required Runtime

Claude Code or Claude Desktop
Recommended: GitHub Personal Access Token (for GitHub MCP server)
Recommended: Bright Data API token (for Reddit scraping)

Recommended MCP Dependencies

@modelcontextprotocol/server-github — direct GitHub repo API access
@brightdata/mcp — web scraping + Reddit (requires Pro Mode for Reddit)

Target AI Tools

Claude Code (primary — plugin install)
Claude Desktop (skill upload)

Components

Eval Marketplace — Components

Skills (2)

`agent-skill-evaluator`

Security and safety evaluation for agent skills (.skill files / SKILL.md files).

Evaluation checks:

Prompt injection detection
Malicious code pattern matching
Hidden instruction scanning
Data exfiltration attempt detection
Community validation (GitHub issues, Reddit)
Risk scoring (0–100 scale)
Actionable recommendations

Trigger conditions: user provides GitHub URL, website link, or .skill file; asks "is this skill safe?"

`mcp-evaluator`

Security and privacy evaluation for MCP servers.

Evaluation checks:

Security vulnerability analysis
Privacy risk assessment
Code quality review
Alternative server discovery
Community feedback (Reddit, forums, GitHub)
Multi-dimensional scoring
Usability assessment

Trigger conditions: user provides GitHub URL to an MCP server; asks "is this MCP safe?"

Marketplace Manifest

evaluator-tools/ plugin containing both skills. Installed via /plugin install evaluator-tools@eval-marketplace.

Prepackaged Downloads

agent-skill-evaluator.zip and mcp-evaluator.zip — ZIP archives containing the skill directories for manual installation. No scripts bundled.

MCP Server Dependencies (external, not bundled)

GitHub MCP (@modelcontextprotocol/server-github) — provides list_commits, search_repositories, direct file API
Bright Data MCP (@brightdata/mcp) — provides scrape_as_markdown, scrape_batch, Reddit access

Prompts

Eval Marketplace — Prompt Files and Techniques

Prompt 1: Agent Skill Evaluator — Tool Strategy (Graceful Degradation pattern)

## Tool Strategy

This skill works with available MCPs and tools through graceful degradation:

**For GitHub repositories**:
- **Priority**: GitHub MCP (if available) for direct repository API access
- **Alternatives**: Bright Data MCP (The Web MCP) or built-in web tools for scraping
- **Fallback**: User-provided file upload if direct access fails

**For websites and direct .skill file URLs**:
- **Priority**: Bright Data MCP (The Web MCP) for website scraping and content fetching
- **Alternatives**: Built-in web_search and web_fetch tools
- **Fallback**: User-provided file upload if direct access fails

Technique: Explicit fallback chain with labeled tiers (Priority / Alternatives / Fallback). Each tool has a named tier and a condition trigger. This is a defensive prompt pattern that prevents the skill from failing when optimal dependencies are absent — unusual in this corpus where most skills assume their full dependency set.

Prompt 2: MCP Evaluator — Progressive File Update (Incremental artifact pattern)

### Step 3: Create Assessment File

Use built-in `create_file` tool to create assessment file in `/mnt/user-data/outputs/`:
- File naming: `MCP_Security_Assessment_{owner}_{repo_name}.md`
- Update iteratively throughout evaluation process

Technique: Incremental artifact construction. The skill creates the output file immediately and updates it iteratively rather than writing once at the end. This provides partial results if the evaluation is interrupted and gives the user a live progress artifact they can inspect mid-evaluation.

Prompt 3: Agent Skill Evaluator — Trigger Conditions (Activation framing)

## When to Use This Skill

Use this skill when users:
- Provide a GitHub URL to a skill repository
- Share a website link where a skill can be downloaded
- Provide a direct link to a .skill file
- Ask "is this skill safe to use?"
- Request security assessment of a skill
- Want to evaluate safety risks before installing a skill
- Need to identify prompt injections or malicious patterns
- Ask about the trustworthiness of a skill source

Technique: Enumerated trigger conditions. Rather than a single description, the skill lists 7 specific trigger phrases and scenarios. This increases the probability that the agent harness correctly activates the skill for supply-chain security questions without requiring the exact skill name to be invoked.

Uniqueness

Eval Marketplace — Uniqueness and Positioning

Differs from Seeds

Distribution pattern closest to superpowers (skills-only, zero commands, Claude Code plugin). The fundamental delta: Eval Marketplace evaluates external artifacts (third-party skills and MCP servers) for trustworthiness before installation, whereas all 11 seeds focus on the agent's own behavior during development tasks. This is a supply-chain security application of the agent skill format — a use case not present in any seed. The multi-tier graceful degradation tooling strategy (Priority / Alternatives / Fallback per tool category) is not present in any seed's skill design.

Positioning

Eval Marketplace occupies the supply-chain security gate — the moment just before a developer decides to install a third-party AI tool component. It uses the AI agent itself to evaluate whether it's safe to give the AI agent more capabilities. This meta-evaluation pattern is architecturally interesting: an agent using MCP tools to evaluate MCP tools.

The deprecation to jeredblu-marketplace means this repo is a historical reference; the actual maintained versions of these skills exist elsewhere.

Observable Failure Modes

Deprecated: No updates promised. Skills may drift from Claude Code's evolving skill format.
Reddit dependency: The most valuable community signal (Reddit) requires Bright Data Pro Mode — a paid dependency. Without it, community validation is weaker.
No ground truth: Skills can only do static analysis of SKILL.md content; they cannot sandbox-execute the skill to observe runtime behavior. A sophisticated attacker could hide malicious logic behind conditional activation.
False confidence: A 0–100 risk score gives false precision. The actual detection capability depends entirely on the LLM's pattern matching, which can be evaded.
Output file path hardcoded: MCP evaluator writes to /mnt/user-data/outputs/ — this path is Claude Desktop-specific and may not work in Claude Code.

Explicit Antipatterns

None stated explicitly, but the README implies: installing skills or MCPs without security review is the antipattern this tool addresses.

Workflow

Eval Marketplace — Workflow

Agent Skill Evaluator Workflow

Phase	What happens	Artifact
1. Setup	Ask user output format (md/pdf)	—
2. URL parsing	Identify source type (GitHub repo / website / direct .skill file)	source type
3. Skill acquisition	GitHub MCP → API access; fallback Bright Data scrape; fallback user upload	raw skill content
4. Extraction	Unzip .skill archive; read SKILL.md + scripts	parsed skill contents
5. Static analysis	Prompt injection scan; malicious code patterns; hidden instructions; exfiltration patterns	raw findings
6. Community validation	Search GitHub issues + Reddit (if Bright Data Pro Mode available)	community feedback
7. Risk scoring	Aggregate findings into 0–100 risk score	risk score
8. Report generation	Markdown report (optional PDF conversion)	report file

MCP Evaluator Workflow

Phase	What happens	Artifact
1. Setup	Ask output format; check available tools	—
2. File creation	Create assessment file at `/mnt/user-data/outputs/MCP_Security_Assessment_<owner>_<repo>.md`	assessment file
3. Repository access	GitHub MCP (priority) → Bright Data scrape → Claude built-ins	repo content
4. Code review	Source file scan for vulnerabilities; commit history activity analysis	raw findings
5. Alternative search	`search_repositories` for similar MCP servers	alternatives list
6. Community research	Web search + Reddit (Pro Mode)	community signals
7. Multi-dimensional scoring	Security / privacy / quality / usability axes	scored assessment
8. Report finalization	Iteratively update assessment file; final markdown	final report

Approval Gates

None automatic — both skills are purely investigative. The output is a report for human review; the human decides whether to install.

Output Files

MCP_Security_Assessment_<owner>_<repo>.md — written to /mnt/user-data/outputs/
Agent skill reports: markdown or PDF per user choice

Memory Context

Eval Marketplace — Memory and Context

State During Evaluation

Assessment file (MCP_Security_Assessment_*.md) is created immediately and updated iteratively throughout the evaluation run. This is the only persistent artifact.
No database, no vector store, no cross-session memory.

Cross-session Handoff

No. Each evaluation is independent. The output markdown file serves as the only persistent record.

Context Within Session

The MCP evaluator skill explicitly reads both a JSON schema file and a configuration reference file from its references/ directory at the start of each invocation. These are loaded into context to inform validation — this is file-based context injection rather than a persistent memory mechanism.

Memory Type

File-based, session-scoped. The assessment markdown file is the only output; it is not read back in subsequent sessions.

Orchestration

Eval Marketplace — Orchestration

Multi-agent

No. Each skill runs as a single agent. No subagent spawning.

Orchestration Pattern

Sequential. Each evaluation phase completes before the next begins.

Isolation Mechanism

None. Skills run in the standard agent context.

Execution Mode

One-shot per evaluation invocation.

Multi-model

No. Uses the agent harness's configured model.

External MCP Dependencies

The skills use external MCP servers as tools (GitHub MCP, Bright Data MCP) but these are not subagents — they are tool calls within the same agent session.

Consensus Mechanism

None. Single agent review.

Ui Cli Surface

Eval Marketplace — UI and CLI Surface

Dedicated CLI Binary

No. Skills are invoked through Claude Code's natural language interface, not a CLI.

Local UI

None.

IDE Integration

Claude Code plugin marketplace. Installed via /plugin install evaluator-tools@eval-marketplace.

Claude Desktop Integration

Upload via Settings > Capabilities > Upload Skill. Compatible with Claude Desktop.

Observability

Output: markdown report file written to /mnt/user-data/outputs/ (MCP evaluator) or user's chosen location (skill evaluator). PDF conversion optional via post-processing.

No structured logging, no audit trail beyond the output report itself.

Related frameworks

same archetype · same primary tool · same memory type

BMAD-METHOD ★ 48k

A4 Markdown scaffold

Provides a full agile delivery lifecycle with named expert-persona AI collaborators that elicit the human's best thinking rather…

Agent OS ★ 4.6k

A4 Markdown scaffold

Extracts implicit codebase conventions into token-efficient markdown standards files and injects them selectively into AI agent…

Claude Conductor ★ 367

A4 Markdown scaffold

Gives Claude Code a persistent, cross-linked, auto-analyzed documentation system so it retains codebase context across sessions.

Spec-Driver (Greenfield Spec-Driven Development) ★ 25

A4 Markdown scaffold

Prevents spec rot in AI-assisted development by making implementation changes flow back into evergreen, authoritative specs via…

Anthropic Knowledge Work Plugins ★ 16k

A4 Markdown scaffold

Role-specialized plugin bundles with live MCP connectors that turn Claude into a domain expert for enterprise knowledge workers.

Codex Integration for Claude Code (skill-codex) ★ 1.3k

A4 Markdown scaffold

Single Claude Code skill that handles Codex CLI invocation correctly (stdin blocking, thinking token suppression, session resume)…

Distribution

Type: claude-plugin
Install: npm-install
Version: unknown (deprecated)

Surfaces

CLI binary: No
CLI subcmds: 0
Local UI: No
Tech stack: none

Components

Commands: 0
Skills: 2
Subagents: 0
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 0
Templates: 0

Workflow

Phases: 5
Approval gates: 0
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: No
Pattern: sequential
Max concurrent: 1
Isolation: none
Consensus: none
Prompt chaining: No

Multi-model

Multi-model: No
BYOK: No
Modal: text

Execution

Mode: one-shot
Crash recovery: No
Compaction: No
Session handoff: No
Streaming: No

Memory

Type: file-based
Persistence: session
Search: none
State files: 1 file

Quality

TDD: No
TDD mechanism: none
Validators: 1
Self-review: inline-self

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: No
Audit format: none
Replay: No

Tools

Primary: claude-code
Targets: 2
Portability: low

Signals

Stars: 22
Last commit: 2026-02-03
Contributors: 1
Maintainer: dormant
Quality score: 0.4/10