Skip to content
/

Clearwing

clearwing · Lazarus-AI/clearwing · ★ 982 · last commit 2026-05-21

Primitive shape 87 total
Commands 20 Subagents 4 MCP tools 63
00

Summary

Clearwing — Summary

Clearwing is an autonomous offensive-security CLI tool (binary: clearwing, PyPI package: clearwing) that runs network penetration testing, source-code vulnerability hunting, N-day exploit development, reverse engineering, and campaign orchestration using an LLM-backed ReAct loop with 63 bound tools. It is a direct open-source reimplementation of Anthropic's internal Glasswing tool, built by Eric Hartford at Lazarus AI using the genai-pyo3 Rust-backed LLM runtime. The source-code hunter pipeline is architecturally sophisticated: it fans out per-file hunter agents with a 4-axis validator (REAL / TRIGGERABLE / IMPACTFUL / GENERAL), uses ASan/UBSan crashes as ground truth, runs PoC stability checks across fresh containers, and emits SARIF/markdown/JSON reports with explicit evidence levels. Clearwing ships both a Textual TUI and a FastAPI/WebSocket web UI in addition to its CLI. The tool requires explicit authorization for authorized testing only and includes human-in-the-loop exploit approval guardrails.

Differs from seeds: No seed is architecturally similar — this is a security research platform, not a development workflow harness. The closest by distribution pattern is agent-os (bash-bundle, standalone-repo), but Clearwing is substantially more complex. Among seeds, claude-flow (Archetype 3: MCP-anchored multi-agent with a hive-mind) is most analogous in terms of multi-model routing and parallel agent fan-out, but Clearwing's domain is offensive security rather than code generation, and it has no Claude Code skill layer.

01

Overview

Clearwing — Origin, Philosophy, and Manifesto

Origin

Built by Eric Hartford (Lazarus AI) as an open-source reimplementation of Anthropic's internal Glasswing security research tool. The stated challenge in the README: "Produce similar results as Glasswing - using models everyone has access to."

Licensed MIT. Version 1.0.0. Python 3.10+, backed by genai-pyo3 — a Rust-native LLM runtime that speaks Anthropic, OpenAI, OpenRouter, Ollama, LM Studio, Together, Groq, DeepSeek, MiniMax, Gemini, and any OpenAI-compatible endpoint.

Philosophy

Clearwing takes a maximalist approach to autonomous security testing. The design philosophy is:

  • Parallel fan-out over serial scanning: Files are ranked by an LLM ranker agent, then hunted in parallel by independent hunter agents — not sequentially.
  • Multi-model routing by task: Different pipeline stages use different model sizes and capabilities (ranker = cheap/fast, hunter = capable, verifier = cheap/fast, exploiter = code-specialized).
  • Ground truth from dynamic analysis: ASan/UBSan crashes, not just LLM suspicion, validate findings. PoC stability checks run across fresh Docker containers.
  • Explicit evidence levels: Findings progress through named evidence rungs: suspicion → static_corroboration → crash_reproduced → root_cause_explained → exploit_demonstrated → patch_validated.

Authorized use philosophy

From README: "Authorized use only. Clearwing is a dual-use offensive-security tool. Run it only against targets you own or have explicit written authorization to test. Operators are responsible for scope, legal authorization, and disclosure."

Human-in-the-loop gating is built into the exploit development pipeline — exploits require human approval before execution attempts.

Comparison to Glasswing

Anthropic's Glasswing is closed-source and used internally. Clearwing uses publicly accessible model APIs to produce equivalent results. The explicit comparison to a closed Anthropic tool positions Clearwing as a democratization of AI-assisted vulnerability research.

02

Architecture

Clearwing — Architecture, Distribution, and Installation

Distribution

  • Python package installable via uv sync (preferred) or pip
  • Source install from GitHub (no PyPI publish confirmed)
  • Docker required for Kali container and sanitizer-image sandbox features

Installation

git clone https://github.com/Lazarus-AI/clearwing.git
cd clearwing
uv sync --all-extras
source .venv/bin/activate
clearwing setup     # interactive wizard
clearwing doctor    # environment check

Version analyzed: 1.0.0

Directory Tree

clearwing/
├── clearwing/           # Python package
│   ├── agent/           # ReAct agent loop
│   ├── analysis/        # static analysis
│   ├── bench/           # OSS-Fuzz benchmark
│   ├── capabilities.py  # 63-tool binding
│   ├── core/            # config + engine
│   ├── crypto/          # SHA-3 commitments
│   ├── data/            # knowledge graph
│   ├── eval/            # A/B test framework
│   ├── exploitation/    # exploit development
│   ├── findings/        # finding deduplication
│   ├── llm/             # AsyncLLMClient + native backend
│   │   └── native.py    # AsyncLLMClient (multi-stage routing)
│   ├── mcp/             # MCP integration (clearwing mcp cmd)
│   ├── observability/   # telemetry
│   ├── providers/       # provider adapters
│   ├── reporting/       # SARIF/markdown/JSON emitters
│   ├── runners/         # SourceHuntRunner, CampaignRunner
│   ├── safety/          # human-in-loop approval gating
│   ├── sandbox/         # Docker-based sandbox execution
│   ├── scanning/        # network scan
│   ├── sourcehunt/      # parallel file-hunter pipeline
│   ├── ui/
│   │   ├── cli.py        # argparse dispatcher
│   │   ├── commands/     # 20 subcommand modules
│   │   ├── tui/          # Textual TUI
│   │   └── web/          # FastAPI + WebSocket
├── docs/                # mkdocs documentation
├── tests/               # pytest test suite
├── pyproject.toml       # entry: clearwing = "clearwing:main"
└── Makefile

Subcommands (20)

bench, campaign, ci, config, disclose, doctor, eval, graph, history, interactive, mcp, operate, parallel, report, scan, sessions, setup, sourcehunt, webui (+ implicit help)

Required Runtime

  • Python >= 3.10
  • Docker (for Kali container, sanitizer sandbox, PoC stability checks)
  • At least one LLM provider API key (Anthropic, OpenAI, OpenRouter, Ollama, etc.)
  • Optional: Ghidra (for reverse engineering pipeline)

Target AI Tools

Clearwing IS the agent — it does not integrate with Claude Code or other coding agents as a skill. It uses LLM APIs directly.

03

Components

Clearwing — Components

CLI Subcommands (20)

Command Purpose
scan Network pentest agent — ReAct loop with 63 tools, targets live hosts
sourcehunt Source-code vulnerability hunter — file-parallel LLM pipeline
campaign Run sourcehunt across dozens/hundreds of repos from YAML config
interactive Non-scripted ReAct chat with full tool set
ci Non-interactive CI mode with SARIF output for GitHub Code Scanning
setup Interactive wizard — provider selection, credentials, config
doctor Environment check — Python, credentials, Docker, external tools
disclose Responsible disclosure workflow — MITRE/HackerOne templates, timeline
bench OSS-Fuzz crash severity benchmark for model comparison
eval A/B test whether preprocessing helps/hurts finding quality
config Configuration management
report Generate reports from findings
graph Knowledge graph visualization
history Session history
sessions Session management
operate Operator commands
parallel Explicit parallel execution control
mcp MCP server integration
webui Launch FastAPI web UI
mcp Start MCP server mode

Pipeline Components

  • LLM Ranker: ranks source files by vulnerability likelihood (uses smaller/cheaper model)
  • Hunter agents: per-file LLM hunters (full-shell or constrained mode, parallel fan-out)
  • 4-axis validator: checks REAL / TRIGGERABLE / IMPACTFUL / GENERAL — adversarial verification
  • PoC stability checker: runs PoC across fresh Docker containers
  • Exploit developer: multi-turn agentic exploit development with human-in-loop approval
  • Patch validator: confirms fix works against patched version
  • Root-cause deduplicator: shared findings pool with cross-subsystem dedup

UI Components

  • Textual TUI: terminal UI (Python textual library)
  • FastAPI web UI: launched by clearwing webui, WebSocket-connected

Models JSON

docs/providers.md — per-task routing documentation. SourceHuntRunner accepts separate ranker_llm, hunter_llm, verifier_llm, exploiter_llm as separate AsyncLLMClient instances.

Safety Component

clearwing/safety/ — human-in-the-loop exploit approval gating before execution attempts.

05

Prompts

Clearwing — Prompt Files and Techniques

Prompt 1: SourceHuntRunner Multi-Model Routing (README verbatim)

# One client per stage — routes each stage to a different model
ranker_llm    = AsyncLLMClient(model_name='gpt-5.4-mini',  **COMMON)
hunter_llm    = AsyncLLMClient(model_name='gpt-5.4',       **COMMON)
verifier_llm  = AsyncLLMClient(model_name='gpt-5.4-mini',  **COMMON)
exploiter_llm = AsyncLLMClient(model_name='gpt-5.3-codex', **COMMON)

runner = SourceHuntRunner(
    repo_url=REPO, local_path=REPO,
    depth='standard',
    budget_usd=1000.0,
    max_parallel=15,
    output_dir=RUN_DIR,
    output_formats=['json', 'markdown'],
    ranker_llm=ranker_llm,
    hunter_llm=hunter_llm,
    verifier_llm=verifier_llm,
    exploiter_llm=exploiter_llm,

Technique: Multi-model pipeline routing. Each pipeline stage is explicitly assigned a model class matched to its task difficulty — cheap/fast model for ranking and verification, capable model for hunting, code-specialized model for exploit development. This is a rare explicit code-level demonstration of per-stage model routing (unlike most frameworks that rely on config files or user choice).

Prompt 2: CLI Quickstart (Persona/goal framing)

# Source-code hunt a repo (standard depth — sandboxed LLM hunters,
# adversarial verifier, mechanism memory, variant loop)
clearwing sourcehunt https://github.com/example/project \
    --depth standard

# N-day exploit pipeline — build and exploit known CVEs
clearwing sourcehunt https://github.com/example/project \
    --nday --cve-list CVE-2024-1234,CVE-2024-5678

# Reverse engineering — hunt vulnerabilities in closed-source binaries
clearwing sourcehunt /path/to/binary --reveng --arch x86_64

Technique: Task decomposition via CLI flags. The --depth flag (standard/deep/etc.) encodes a preset pipeline configuration. The --nday and --reveng flags switch the entire pipeline into alternate modes. This is flag-as-mode-selector rather than prompt-as-workflow-selector.

Prompt 3: Evidence Level Taxonomy (Structured output framing)

From README:

suspicion → static_corroboration → crash_reproduced → 
root_cause_explained → exploit_demonstrated → patch_validated

Technique: Structured evidence ladder with explicit state names. This taxonomy is used both as internal finding state machine and as output classification in reports. The named rungs create a verifiable progression that CI and human reviewers can use to filter findings by confidence level — a form of self-documenting verification artifact.

09

Uniqueness

Clearwing — Uniqueness and Positioning

Differs from Seeds

No seed framework is architecturally similar — Clearwing is an offensive security research platform, not a development workflow harness. The closest architectural parallel is claude-flow (Archetype 3: parallel multi-agent with per-role model routing), but Clearwing's domain is vulnerability discovery rather than software development. Clearwing differs from all seeds by having: (a) domain-specific knowledge embedded in the pipeline (ASan/UBSan crash integration, CVE lookup, Ghidra decompilation), (b) Docker sandbox isolation for PoC testing, (c) an explicit evidence ladder with cryptographic commitments, and (d) legal/responsible disclosure tooling. It is the only framework in this batch that ships a FastAPI web UI, a Textual TUI, AND a CLI. Most distinct from agent-os and claude-conductor which are purely text-file harnesses.

Positioning

Clearwing is the democratized, open-source equivalent of Anthropic's internal Glasswing tool. It targets security researchers and red teams who want to apply LLM-driven vulnerability discovery at scale. Its differentiation from other LLM security tools (Nuclei-AI, etc.) is the full pipeline: from file ranking through crash-validated PoC through responsible disclosure.

The genai-pyo3 Rust backend is unique in this corpus — all other Python-based frameworks use pure Python LLM clients.

Observable Failure Modes

  1. Docker dependency: The most interesting capabilities (sanitizer sandbox, Kali pentest) require Docker. Environments without Docker get degraded functionality.
  2. Legal liability surface: Dual-use tool with broad capabilities. The README warning is minimal; operators bear full legal risk.
  3. Budget blowup on large repos: The per-file fan-out at max_parallel=15 with capable models can be extremely expensive. budget_usd parameter is the only control.
  4. Model availability coupling: The README examples use model names like gpt-5.3-codex and gpt-5.4-mini — if these models are deprecated or renamed, pipeline configuration breaks silently.
  5. Single maintainer: 10 contributors but Eric Hartford appears to be the sole active maintainer. No governance structure visible.

Explicit Antipatterns

  • Running against unauthorized targets
  • Using without Docker for full pipeline functionality
04

Workflow

Clearwing — Workflow

SourceHunt Pipeline (Primary Workflow)

Phase What happens Artifact
1. File ranking LLM ranker scores all source files by vuln likelihood ranked file list
2. Parallel fan-out N hunter agents launched per ranked file concurrent agent pool
3. Per-file hunting Each hunter LLM reads file, identifies candidates raw finding candidates
4. 4-axis validation Adversarial verifier checks REAL/TRIGGERABLE/IMPACTFUL/GENERAL validated findings
5. PoC development Exploiter LLM writes proof-of-concept code PoC code
6. PoC stability Fresh Docker container runs, ASan/UBSan monitors crash signal
7. Human approval gate Human reviews exploit before execution approval/rejection
8. Patch validation Optional: run against patched version validation signal
9. Reporting SARIF / markdown / JSON report output files

Network Scan Workflow

Phase Artifact
Target specification host/port/range
Service detection service list
ReAct loop (63 tools) vulnerability candidates
Human approval gate exploit execution decision
Reporting markdown/JSON report

Campaign Workflow

YAML config specifies multiple repos, shared budget, checkpoint/resume state. Each repo runs sourcehunt; aggregate reporting across all repos.

Evidence Ladder (Explicit states)

suspicion → static_corroboration → crash_reproduced → root_cause_explained → exploit_demonstrated → patch_validated

Findings are classified by highest evidence level reached.

Approval Gates

  1. Exploit execution gate: human approval required before attempting exploit against a live target
  2. Responsible disclosure review: clearwing disclose review — manual human review of generated disclosure reports

Budget System

Three-band budget promotion — findings with higher evidence levels get more budget allocated. budget_usd parameter controls total spend per run.

06

Memory Context

Clearwing — Memory and Context

Knowledge Graph

clearwing/data/ + networkx — a persistent knowledge graph that accumulates findings across scan sessions. The network scan agent writes to this graph; clearwing graph visualizes it.

ChromaDB Vector Store

chromadb>=1.0.0 in dependencies — used for mechanism memory in the sourcehunt pipeline ("mechanism memory" is referenced in README quickstart comment: "standard depth — sandboxed LLM hunters, adversarial verifier, mechanism memory, variant loop"). This enables finding deduplication and variant hypothesis generation across file batches.

Campaign Checkpoint/Resume

clearwing campaign run campaign.yaml supports checkpoint/resume — campaign state is persisted to disk so interrupted multi-repo hunts can be resumed without re-running completed repos.

Session History

clearwing history and clearwing sessions subcommands — session state is persisted between runs. clearwing/observability/ handles telemetry and logging.

SHA-3 Cryptographic Commitments

clearwing/crypto/ — SHA-3 hash commitments for responsible disclosure. Creates a cryptographically verifiable timestamped claim of finding priority before public disclosure. Persisted to disk.

Findings Pool

Shared findings pool with root-cause deduplication across parallel hunter agents. State is held in memory during a run and written to the output directory at completion.

Cross-session Handoff

Yes, via the knowledge graph and campaign checkpoint mechanism — scan state persists across sessions.

07

Orchestration

Clearwing — Orchestration

Multi-agent

Yes. The sourcehunt pipeline runs N parallel hunter agents (one per ranked source file), each operating independently. max_parallel=15 in the README example.

Orchestration Pattern

Hierarchical + parallel fan-out:

  • Ranker agent ranks files (sequential planning)
  • N hunter agents run in parallel per file (parallel fan-out)
  • Verifier agent validates each finding (sequential per finding)
  • Exploiter agent develops PoC (sequential per validated finding)

Multi-model Routing

Explicit, code-level, per-stage:

  • ranker_llm: cheap/fast (gpt-5.4-mini class)
  • hunter_llm: capable (gpt-5.4 class)
  • verifier_llm: cheap/fast (gpt-5.4-mini class)
  • exploiter_llm: code-specialized (gpt-5.3-codex class)

Isolation Mechanism

Docker containers for:

  • Kali Linux container for network pentest tools
  • Sanitizer sandbox (ASan/UBSan) for PoC stability testing
  • Fresh containers per PoC stability check run

Execution Mode

Interactive-loop (ReAct agent for network scan and interactive mode) and one-shot pipeline (sourcehunt, campaign). CI mode is one-shot.

Consensus Mechanism

4-axis adversarial validation (REAL / TRIGGERABLE / IMPACTFUL / GENERAL). Each axis is a boolean gate; a finding must pass all four to proceed to PoC development. This is not raft/byzantine consensus but it is an explicit multi-criterion adversarial filter.

Human in the Loop

Exploit execution requires human approval (safety gate in clearwing/safety/). Disclosure review is also human-gated.

Campaign Orchestration

YAML-driven campaign runs sourcehunt across dozens/hundreds of repos with shared budget, checkpoint/resume, and aggregate reporting — effectively a scheduled batch job over a repository list.

08

Ui Cli Surface

Clearwing — UI and CLI Surface

Dedicated CLI Binary

Yes. Binary name: clearwing. Entry point: clearwing:main. 20 subcommands.

CLI Design

Argparse-based dispatcher. Each subcommand lives in its own module under clearwing.ui.commands. Subcommand modules may declare an ALIASES tuple for alternate names.

Key subcommands for agent use:

  • clearwing sourcehunt <url> --depth standard
  • clearwing ci --config .clearwing.ci.yaml --sarif results.sarif
  • clearwing campaign run campaign.yaml

Textual TUI

clearwing/ui/tui/ — Python Textual-based terminal UI with screens and components. Launched implicitly or via clearwing interactive.

FastAPI Web UI

clearwing/ui/web/ — FastAPI + WebSocket-backed web dashboard. Launched via clearwing webui. Port: unknown (not specified in README). Stack: FastAPI, uvicorn, websockets. Static files served from clearwing/ui/web/static/.

CI/CD Integration

clearwing ci --config .clearwing.ci.yaml --sarif results.sarif — SARIF output compatible with GitHub Code Scanning. Non-interactive CI mode.

Setup Wizard

clearwing setup — interactive menu-driven provider selection, credential entry, optional live test, persists to ~/.clearwing/config.yaml.

clearwing doctor — verifies Python, credentials, Docker daemon, external tools, optional extras, network reachability.

MCP Server

clearwing mcp — Clearwing can expose its capabilities as an MCP server. Module at clearwing/mcp/.

Reporting

Output formats: json, markdown, sarif. Reports written to --output-dir. clearwing report subcommand for standalone report generation from stored findings.

Related frameworks

same archetype · same primary tool · same memory type

Daytona ★ 72k

Provide secure, elastic, sub-90ms sandbox compute infrastructure for running AI-generated code, accessible via multi-language…

CUA ★ 17k

Unified SDK for building, benchmarking, and deploying agents that interact with full OS GUIs via isolated VMs.

E2B ★ 12k

Run AI-generated code safely in cloud-hosted isolated sandboxes via a 3-line SDK integration.

OpenSandbox ★ 11k

Protocol-first general-purpose sandbox platform for AI applications with multi-language SDKs and pluggable isolation backends.

Microsandbox ★ 6.3k

Spawn hardware-isolated microVMs as child processes directly from application code, with no server setup, in under 100ms.

CubeSandbox ★ 5.9k

Sub-60ms KVM microVM sandboxes for AI agents with E2B drop-in compatibility and <5MB memory overhead.