Skip to content
/

Eval Marketplace

eval-marketplace · JeredBlu/eval-marketplace · ★ 22 · last commit 2026-02-03

Primitive shape 2 total
Skills 2
00

Summary

Eval Marketplace — Summary

Eval Marketplace is a deprecated Claude Code plugin marketplace containing two security evaluation skills — agent-skill-evaluator and mcp-evaluator — that use GitHub MCP and Bright Data MCP to perform automated security audits of third-party agent skills and MCP servers before installation. The project has been superseded by the jeredblu-marketplace repository, which migrates these skills alongside new tools. Each skill performs a structured workflow: downloads the target artifact, scans for prompt injection patterns, malicious code, hidden instructions, and data exfiltration attempts, then searches community feedback (including Reddit with Bright Data Pro Mode) and generates a risk-scored assessment report (0–100 scale) in markdown or PDF. The skills degrade gracefully when MCP servers are unavailable by falling back to web scraping. This is a verification tool positioned at the supply-chain boundary — evaluating whether to trust a skill or MCP before installing it.

Differs from seeds: Closest to superpowers (skills-only, zero commands) in distribution pattern. Key delta: Eval Marketplace's skills evaluate external artifacts for trustworthiness, whereas superpowers' skills enforce behavioral Iron Laws on the current session. Compared to spec-kit (Archetype 2: mirror commands + skills), this has no commands. The security audit scope (supply-chain verification of third-party components) is architecturally unique across all 11 seeds.

01

Overview

Eval Marketplace — Origin, Philosophy, and Manifesto

Origin

Created by Jered Blumenfeld (JeredBlu). A plugin marketplace for Claude Code containing security evaluation skills. The repository carries a deprecation notice pointing to jeredblu-marketplace as the canonical successor.

Philosophy

The core problem addressed: AI coding agents are increasingly installing third-party skills and MCP servers with minimal security vetting. A malicious skill can inject prompts, exfiltrate data, or execute unauthorized code. Eval Marketplace provides a structured, automated security review process before installation.

The tool acknowledges the trust problem explicitly: even legitimate-looking GitHub repositories can contain hidden instructions in SKILL.md files or malicious patterns in bundled scripts. Community feedback (Reddit, GitHub issues) is used as a signal alongside static code analysis.

Deprecation Status

"⚠️ This marketplace is deprecated. All skills have been migrated to jeredblu-marketplace, which includes these evaluators plus new tools. This repo will remain available but won't receive updates."

This is a stable-but-frozen reference implementation. The skill content itself (the two SKILL.md files) remains valid and usable.

Graceful Degradation Philosophy

The skills are designed to degrade gracefully when MCP servers are unavailable:

  • No GitHub MCP → falls back to web scraping for repository access
  • No Bright Data → uses built-in web search (limited)
  • No Pro Mode → no Reddit scraping, basic search only

This is an explicit design choice: the skills provide partial value without dependencies, full value with them.

02

Architecture

Eval Marketplace — Architecture, Distribution, and Installation

Distribution

Claude Code plugin marketplace entry. Deprecated; migrated to jeredblu-marketplace.

Installation Options

Option 1: Plugin marketplace (recommended)

/plugin marketplace add /path/to/eval-marketplace
/plugin install evaluator-tools@eval-marketplace
# OR from GitHub:
/plugin marketplace add github:jeredblu/eval-marketplace

Option 2: Manual skill installation

Download agent-skill-evaluator.zip or mcp-evaluator.zip
Extract to ~/.claude/skills/agent-skill-evaluator/ or ~/.claude/skills/mcp-evaluator/

Option 3: Claude Desktop — upload via Settings > Capabilities > Upload Skill.

Directory Tree

eval-marketplace/
├── .claude-plugin/
│   ├── marketplace.json        # plugin marketplace manifest
│   └── plugin.json             # (empty — not present)
├── evaluator-tools/
│   └── skills/
│       ├── agent-skill-evaluator/
│       │   ├── SKILL.md        # agent skill evaluator skill
│       │   └── references/
│       └── mcp-evaluator/
│           ├── SKILL.md        # MCP evaluator skill
│           └── references/
├── agent-skill-evaluator.zip   # prepackaged download
├── mcp-evaluator.zip           # prepackaged download
└── README.md

Required Runtime

  • Claude Code or Claude Desktop
  • Recommended: GitHub Personal Access Token (for GitHub MCP server)
  • Recommended: Bright Data API token (for Reddit scraping)
  • @modelcontextprotocol/server-github — direct GitHub repo API access
  • @brightdata/mcp — web scraping + Reddit (requires Pro Mode for Reddit)

Target AI Tools

  • Claude Code (primary — plugin install)
  • Claude Desktop (skill upload)
03

Components

Eval Marketplace — Components

Skills (2)

agent-skill-evaluator

Security and safety evaluation for agent skills (.skill files / SKILL.md files).

Evaluation checks:

  • Prompt injection detection
  • Malicious code pattern matching
  • Hidden instruction scanning
  • Data exfiltration attempt detection
  • Community validation (GitHub issues, Reddit)
  • Risk scoring (0–100 scale)
  • Actionable recommendations

Trigger conditions: user provides GitHub URL, website link, or .skill file; asks "is this skill safe?"

mcp-evaluator

Security and privacy evaluation for MCP servers.

Evaluation checks:

  • Security vulnerability analysis
  • Privacy risk assessment
  • Code quality review
  • Alternative server discovery
  • Community feedback (Reddit, forums, GitHub)
  • Multi-dimensional scoring
  • Usability assessment

Trigger conditions: user provides GitHub URL to an MCP server; asks "is this MCP safe?"

Marketplace Manifest

evaluator-tools/ plugin containing both skills. Installed via /plugin install evaluator-tools@eval-marketplace.

Prepackaged Downloads

agent-skill-evaluator.zip and mcp-evaluator.zip — ZIP archives containing the skill directories for manual installation. No scripts bundled.

MCP Server Dependencies (external, not bundled)

  • GitHub MCP (@modelcontextprotocol/server-github) — provides list_commits, search_repositories, direct file API
  • Bright Data MCP (@brightdata/mcp) — provides scrape_as_markdown, scrape_batch, Reddit access
05

Prompts

Eval Marketplace — Prompt Files and Techniques

Prompt 1: Agent Skill Evaluator — Tool Strategy (Graceful Degradation pattern)

## Tool Strategy

This skill works with available MCPs and tools through graceful degradation:

**For GitHub repositories**:
- **Priority**: GitHub MCP (if available) for direct repository API access
- **Alternatives**: Bright Data MCP (The Web MCP) or built-in web tools for scraping
- **Fallback**: User-provided file upload if direct access fails

**For websites and direct .skill file URLs**:
- **Priority**: Bright Data MCP (The Web MCP) for website scraping and content fetching
- **Alternatives**: Built-in web_search and web_fetch tools
- **Fallback**: User-provided file upload if direct access fails

Technique: Explicit fallback chain with labeled tiers (Priority / Alternatives / Fallback). Each tool has a named tier and a condition trigger. This is a defensive prompt pattern that prevents the skill from failing when optimal dependencies are absent — unusual in this corpus where most skills assume their full dependency set.

Prompt 2: MCP Evaluator — Progressive File Update (Incremental artifact pattern)

### Step 3: Create Assessment File

Use built-in `create_file` tool to create assessment file in `/mnt/user-data/outputs/`:
- File naming: `MCP_Security_Assessment_{owner}_{repo_name}.md`
- Update iteratively throughout evaluation process

Technique: Incremental artifact construction. The skill creates the output file immediately and updates it iteratively rather than writing once at the end. This provides partial results if the evaluation is interrupted and gives the user a live progress artifact they can inspect mid-evaluation.

Prompt 3: Agent Skill Evaluator — Trigger Conditions (Activation framing)

## When to Use This Skill

Use this skill when users:
- Provide a GitHub URL to a skill repository
- Share a website link where a skill can be downloaded
- Provide a direct link to a .skill file
- Ask "is this skill safe to use?"
- Request security assessment of a skill
- Want to evaluate safety risks before installing a skill
- Need to identify prompt injections or malicious patterns
- Ask about the trustworthiness of a skill source

Technique: Enumerated trigger conditions. Rather than a single description, the skill lists 7 specific trigger phrases and scenarios. This increases the probability that the agent harness correctly activates the skill for supply-chain security questions without requiring the exact skill name to be invoked.

09

Uniqueness

Eval Marketplace — Uniqueness and Positioning

Differs from Seeds

Distribution pattern closest to superpowers (skills-only, zero commands, Claude Code plugin). The fundamental delta: Eval Marketplace evaluates external artifacts (third-party skills and MCP servers) for trustworthiness before installation, whereas all 11 seeds focus on the agent's own behavior during development tasks. This is a supply-chain security application of the agent skill format — a use case not present in any seed. The multi-tier graceful degradation tooling strategy (Priority / Alternatives / Fallback per tool category) is not present in any seed's skill design.

Positioning

Eval Marketplace occupies the supply-chain security gate — the moment just before a developer decides to install a third-party AI tool component. It uses the AI agent itself to evaluate whether it's safe to give the AI agent more capabilities. This meta-evaluation pattern is architecturally interesting: an agent using MCP tools to evaluate MCP tools.

The deprecation to jeredblu-marketplace means this repo is a historical reference; the actual maintained versions of these skills exist elsewhere.

Observable Failure Modes

  1. Deprecated: No updates promised. Skills may drift from Claude Code's evolving skill format.
  2. Reddit dependency: The most valuable community signal (Reddit) requires Bright Data Pro Mode — a paid dependency. Without it, community validation is weaker.
  3. No ground truth: Skills can only do static analysis of SKILL.md content; they cannot sandbox-execute the skill to observe runtime behavior. A sophisticated attacker could hide malicious logic behind conditional activation.
  4. False confidence: A 0–100 risk score gives false precision. The actual detection capability depends entirely on the LLM's pattern matching, which can be evaded.
  5. Output file path hardcoded: MCP evaluator writes to /mnt/user-data/outputs/ — this path is Claude Desktop-specific and may not work in Claude Code.

Explicit Antipatterns

None stated explicitly, but the README implies: installing skills or MCPs without security review is the antipattern this tool addresses.

04

Workflow

Eval Marketplace — Workflow

Agent Skill Evaluator Workflow

Phase What happens Artifact
1. Setup Ask user output format (md/pdf)
2. URL parsing Identify source type (GitHub repo / website / direct .skill file) source type
3. Skill acquisition GitHub MCP → API access; fallback Bright Data scrape; fallback user upload raw skill content
4. Extraction Unzip .skill archive; read SKILL.md + scripts parsed skill contents
5. Static analysis Prompt injection scan; malicious code patterns; hidden instructions; exfiltration patterns raw findings
6. Community validation Search GitHub issues + Reddit (if Bright Data Pro Mode available) community feedback
7. Risk scoring Aggregate findings into 0–100 risk score risk score
8. Report generation Markdown report (optional PDF conversion) report file

MCP Evaluator Workflow

Phase What happens Artifact
1. Setup Ask output format; check available tools
2. File creation Create assessment file at /mnt/user-data/outputs/MCP_Security_Assessment_<owner>_<repo>.md assessment file
3. Repository access GitHub MCP (priority) → Bright Data scrape → Claude built-ins repo content
4. Code review Source file scan for vulnerabilities; commit history activity analysis raw findings
5. Alternative search search_repositories for similar MCP servers alternatives list
6. Community research Web search + Reddit (Pro Mode) community signals
7. Multi-dimensional scoring Security / privacy / quality / usability axes scored assessment
8. Report finalization Iteratively update assessment file; final markdown final report

Approval Gates

None automatic — both skills are purely investigative. The output is a report for human review; the human decides whether to install.

Output Files

  • MCP_Security_Assessment_<owner>_<repo>.md — written to /mnt/user-data/outputs/
  • Agent skill reports: markdown or PDF per user choice
06

Memory Context

Eval Marketplace — Memory and Context

State During Evaluation

  • Assessment file (MCP_Security_Assessment_*.md) is created immediately and updated iteratively throughout the evaluation run. This is the only persistent artifact.
  • No database, no vector store, no cross-session memory.

Cross-session Handoff

No. Each evaluation is independent. The output markdown file serves as the only persistent record.

Context Within Session

The MCP evaluator skill explicitly reads both a JSON schema file and a configuration reference file from its references/ directory at the start of each invocation. These are loaded into context to inform validation — this is file-based context injection rather than a persistent memory mechanism.

Memory Type

File-based, session-scoped. The assessment markdown file is the only output; it is not read back in subsequent sessions.

07

Orchestration

Eval Marketplace — Orchestration

Multi-agent

No. Each skill runs as a single agent. No subagent spawning.

Orchestration Pattern

Sequential. Each evaluation phase completes before the next begins.

Isolation Mechanism

None. Skills run in the standard agent context.

Execution Mode

One-shot per evaluation invocation.

Multi-model

No. Uses the agent harness's configured model.

External MCP Dependencies

The skills use external MCP servers as tools (GitHub MCP, Bright Data MCP) but these are not subagents — they are tool calls within the same agent session.

Consensus Mechanism

None. Single agent review.

08

Ui Cli Surface

Eval Marketplace — UI and CLI Surface

Dedicated CLI Binary

No. Skills are invoked through Claude Code's natural language interface, not a CLI.

Local UI

None.

IDE Integration

Claude Code plugin marketplace. Installed via /plugin install evaluator-tools@eval-marketplace.

Claude Desktop Integration

Upload via Settings > Capabilities > Upload Skill. Compatible with Claude Desktop.

Observability

Output: markdown report file written to /mnt/user-data/outputs/ (MCP evaluator) or user's chosen location (skill evaluator). PDF conversion optional via post-processing.

No structured logging, no audit trail beyond the output report itself.

Related frameworks

same archetype · same primary tool · same memory type

BMAD-METHOD ★ 48k

Provides a full agile delivery lifecycle with named expert-persona AI collaborators that elicit the human's best thinking rather…

Agent OS ★ 4.6k

Extracts implicit codebase conventions into token-efficient markdown standards files and injects them selectively into AI agent…

Claude Conductor ★ 367

Gives Claude Code a persistent, cross-linked, auto-analyzed documentation system so it retains codebase context across sessions.

Spec-Driver (Greenfield Spec-Driven Development) ★ 25

Prevents spec rot in AI-assisted development by making implementation changes flow back into evergreen, authoritative specs via…

Anthropic Knowledge Work Plugins ★ 16k

Role-specialized plugin bundles with live MCP connectors that turn Claude into a domain expert for enterprise knowledge workers.

Codex Integration for Claude Code (skill-codex) ★ 1.3k

Single Claude Code skill that handles Codex CLI invocation correctly (stdin blocking, thinking token suppression, session resume)…