Clearwing

clearwing · Lazarus-AI/clearwing · ★ 982 · last commit 2026-05-21

Primitive shape 87 total

Commands 20 Subagents 4 MCP tools 63

Summary

Clearwing — Summary

Clearwing is an autonomous offensive-security CLI tool (binary: clearwing, PyPI package: clearwing) that runs network penetration testing, source-code vulnerability hunting, N-day exploit development, reverse engineering, and campaign orchestration using an LLM-backed ReAct loop with 63 bound tools. It is a direct open-source reimplementation of Anthropic's internal Glasswing tool, built by Eric Hartford at Lazarus AI using the genai-pyo3 Rust-backed LLM runtime. The source-code hunter pipeline is architecturally sophisticated: it fans out per-file hunter agents with a 4-axis validator (REAL / TRIGGERABLE / IMPACTFUL / GENERAL), uses ASan/UBSan crashes as ground truth, runs PoC stability checks across fresh containers, and emits SARIF/markdown/JSON reports with explicit evidence levels. Clearwing ships both a Textual TUI and a FastAPI/WebSocket web UI in addition to its CLI. The tool requires explicit authorization for authorized testing only and includes human-in-the-loop exploit approval guardrails.

Differs from seeds: No seed is architecturally similar — this is a security research platform, not a development workflow harness. The closest by distribution pattern is agent-os (bash-bundle, standalone-repo), but Clearwing is substantially more complex. Among seeds, claude-flow (Archetype 3: MCP-anchored multi-agent with a hive-mind) is most analogous in terms of multi-model routing and parallel agent fan-out, but Clearwing's domain is offensive security rather than code generation, and it has no Claude Code skill layer.

Overview

Clearwing — Origin, Philosophy, and Manifesto

Origin

Built by Eric Hartford (Lazarus AI) as an open-source reimplementation of Anthropic's internal Glasswing security research tool. The stated challenge in the README: "Produce similar results as Glasswing - using models everyone has access to."

Licensed MIT. Version 1.0.0. Python 3.10+, backed by genai-pyo3 — a Rust-native LLM runtime that speaks Anthropic, OpenAI, OpenRouter, Ollama, LM Studio, Together, Groq, DeepSeek, MiniMax, Gemini, and any OpenAI-compatible endpoint.

Philosophy

Clearwing takes a maximalist approach to autonomous security testing. The design philosophy is:

Parallel fan-out over serial scanning: Files are ranked by an LLM ranker agent, then hunted in parallel by independent hunter agents — not sequentially.
Multi-model routing by task: Different pipeline stages use different model sizes and capabilities (ranker = cheap/fast, hunter = capable, verifier = cheap/fast, exploiter = code-specialized).
Ground truth from dynamic analysis: ASan/UBSan crashes, not just LLM suspicion, validate findings. PoC stability checks run across fresh Docker containers.
Explicit evidence levels: Findings progress through named evidence rungs: suspicion → static_corroboration → crash_reproduced → root_cause_explained → exploit_demonstrated → patch_validated.

Authorized use philosophy

From README: "Authorized use only. Clearwing is a dual-use offensive-security tool. Run it only against targets you own or have explicit written authorization to test. Operators are responsible for scope, legal authorization, and disclosure."

Human-in-the-loop gating is built into the exploit development pipeline — exploits require human approval before execution attempts.

Comparison to Glasswing

Anthropic's Glasswing is closed-source and used internally. Clearwing uses publicly accessible model APIs to produce equivalent results. The explicit comparison to a closed Anthropic tool positions Clearwing as a democratization of AI-assisted vulnerability research.

Architecture

Clearwing — Architecture, Distribution, and Installation

Distribution

Python package installable via uv sync (preferred) or pip
Source install from GitHub (no PyPI publish confirmed)
Docker required for Kali container and sanitizer-image sandbox features

Installation

git clone https://github.com/Lazarus-AI/clearwing.git
cd clearwing
uv sync --all-extras
source .venv/bin/activate
clearwing setup     # interactive wizard
clearwing doctor    # environment check

Version analyzed: 1.0.0

Directory Tree

clearwing/
├── clearwing/           # Python package
│   ├── agent/           # ReAct agent loop
│   ├── analysis/        # static analysis
│   ├── bench/           # OSS-Fuzz benchmark
│   ├── capabilities.py  # 63-tool binding
│   ├── core/            # config + engine
│   ├── crypto/          # SHA-3 commitments
│   ├── data/            # knowledge graph
│   ├── eval/            # A/B test framework
│   ├── exploitation/    # exploit development
│   ├── findings/        # finding deduplication
│   ├── llm/             # AsyncLLMClient + native backend
│   │   └── native.py    # AsyncLLMClient (multi-stage routing)
│   ├── mcp/             # MCP integration (clearwing mcp cmd)
│   ├── observability/   # telemetry
│   ├── providers/       # provider adapters
│   ├── reporting/       # SARIF/markdown/JSON emitters
│   ├── runners/         # SourceHuntRunner, CampaignRunner
│   ├── safety/          # human-in-loop approval gating
│   ├── sandbox/         # Docker-based sandbox execution
│   ├── scanning/        # network scan
│   ├── sourcehunt/      # parallel file-hunter pipeline
│   ├── ui/
│   │   ├── cli.py        # argparse dispatcher
│   │   ├── commands/     # 20 subcommand modules
│   │   ├── tui/          # Textual TUI
│   │   └── web/          # FastAPI + WebSocket
├── docs/                # mkdocs documentation
├── tests/               # pytest test suite
├── pyproject.toml       # entry: clearwing = "clearwing:main"
└── Makefile

Subcommands (20)

bench, campaign, ci, config, disclose, doctor, eval, graph, history, interactive, mcp, operate, parallel, report, scan, sessions, setup, sourcehunt, webui (+ implicit help)

Required Runtime

Python >= 3.10
Docker (for Kali container, sanitizer sandbox, PoC stability checks)
At least one LLM provider API key (Anthropic, OpenAI, OpenRouter, Ollama, etc.)
Optional: Ghidra (for reverse engineering pipeline)

Target AI Tools

Clearwing IS the agent — it does not integrate with Claude Code or other coding agents as a skill. It uses LLM APIs directly.

Components

Clearwing — Components

CLI Subcommands (20)

Command	Purpose
`scan`	Network pentest agent — ReAct loop with 63 tools, targets live hosts
`sourcehunt`	Source-code vulnerability hunter — file-parallel LLM pipeline
`campaign`	Run sourcehunt across dozens/hundreds of repos from YAML config
`interactive`	Non-scripted ReAct chat with full tool set
`ci`	Non-interactive CI mode with SARIF output for GitHub Code Scanning
`setup`	Interactive wizard — provider selection, credentials, config
`doctor`	Environment check — Python, credentials, Docker, external tools
`disclose`	Responsible disclosure workflow — MITRE/HackerOne templates, timeline
`bench`	OSS-Fuzz crash severity benchmark for model comparison
`eval`	A/B test whether preprocessing helps/hurts finding quality
`config`	Configuration management
`report`	Generate reports from findings
`graph`	Knowledge graph visualization
`history`	Session history
`sessions`	Session management
`operate`	Operator commands
`parallel`	Explicit parallel execution control
`mcp`	MCP server integration
`webui`	Launch FastAPI web UI
`mcp`	Start MCP server mode

Pipeline Components

LLM Ranker: ranks source files by vulnerability likelihood (uses smaller/cheaper model)
Hunter agents: per-file LLM hunters (full-shell or constrained mode, parallel fan-out)
4-axis validator: checks REAL / TRIGGERABLE / IMPACTFUL / GENERAL — adversarial verification
PoC stability checker: runs PoC across fresh Docker containers
Exploit developer: multi-turn agentic exploit development with human-in-loop approval
Patch validator: confirms fix works against patched version
Root-cause deduplicator: shared findings pool with cross-subsystem dedup

UI Components

Textual TUI: terminal UI (Python textual library)
FastAPI web UI: launched by clearwing webui, WebSocket-connected

Models JSON

docs/providers.md — per-task routing documentation. SourceHuntRunner accepts separate ranker_llm, hunter_llm, verifier_llm, exploiter_llm as separate AsyncLLMClient instances.

Safety Component

clearwing/safety/ — human-in-the-loop exploit approval gating before execution attempts.

Prompts

Clearwing — Prompt Files and Techniques

Prompt 1: SourceHuntRunner Multi-Model Routing (README verbatim)

# One client per stage — routes each stage to a different model
ranker_llm    = AsyncLLMClient(model_name='gpt-5.4-mini',  **COMMON)
hunter_llm    = AsyncLLMClient(model_name='gpt-5.4',       **COMMON)
verifier_llm  = AsyncLLMClient(model_name='gpt-5.4-mini',  **COMMON)
exploiter_llm = AsyncLLMClient(model_name='gpt-5.3-codex', **COMMON)

runner = SourceHuntRunner(
    repo_url=REPO, local_path=REPO,
    depth='standard',
    budget_usd=1000.0,
    max_parallel=15,
    output_dir=RUN_DIR,
    output_formats=['json', 'markdown'],
    ranker_llm=ranker_llm,
    hunter_llm=hunter_llm,
    verifier_llm=verifier_llm,
    exploiter_llm=exploiter_llm,

Technique: Multi-model pipeline routing. Each pipeline stage is explicitly assigned a model class matched to its task difficulty — cheap/fast model for ranking and verification, capable model for hunting, code-specialized model for exploit development. This is a rare explicit code-level demonstration of per-stage model routing (unlike most frameworks that rely on config files or user choice).

Prompt 2: CLI Quickstart (Persona/goal framing)

# Source-code hunt a repo (standard depth — sandboxed LLM hunters,
# adversarial verifier, mechanism memory, variant loop)
clearwing sourcehunt https://github.com/example/project \
    --depth standard

# N-day exploit pipeline — build and exploit known CVEs
clearwing sourcehunt https://github.com/example/project \
    --nday --cve-list CVE-2024-1234,CVE-2024-5678

# Reverse engineering — hunt vulnerabilities in closed-source binaries
clearwing sourcehunt /path/to/binary --reveng --arch x86_64

Technique: Task decomposition via CLI flags. The --depth flag (standard/deep/etc.) encodes a preset pipeline configuration. The --nday and --reveng flags switch the entire pipeline into alternate modes. This is flag-as-mode-selector rather than prompt-as-workflow-selector.

Prompt 3: Evidence Level Taxonomy (Structured output framing)

From README:

suspicion → static_corroboration → crash_reproduced → 
root_cause_explained → exploit_demonstrated → patch_validated

Technique: Structured evidence ladder with explicit state names. This taxonomy is used both as internal finding state machine and as output classification in reports. The named rungs create a verifiable progression that CI and human reviewers can use to filter findings by confidence level — a form of self-documenting verification artifact.

Uniqueness

Clearwing — Uniqueness and Positioning

Differs from Seeds

No seed framework is architecturally similar — Clearwing is an offensive security research platform, not a development workflow harness. The closest architectural parallel is claude-flow (Archetype 3: parallel multi-agent with per-role model routing), but Clearwing's domain is vulnerability discovery rather than software development. Clearwing differs from all seeds by having: (a) domain-specific knowledge embedded in the pipeline (ASan/UBSan crash integration, CVE lookup, Ghidra decompilation), (b) Docker sandbox isolation for PoC testing, (c) an explicit evidence ladder with cryptographic commitments, and (d) legal/responsible disclosure tooling. It is the only framework in this batch that ships a FastAPI web UI, a Textual TUI, AND a CLI. Most distinct from agent-os and claude-conductor which are purely text-file harnesses.

Positioning

Clearwing is the democratized, open-source equivalent of Anthropic's internal Glasswing tool. It targets security researchers and red teams who want to apply LLM-driven vulnerability discovery at scale. Its differentiation from other LLM security tools (Nuclei-AI, etc.) is the full pipeline: from file ranking through crash-validated PoC through responsible disclosure.

The genai-pyo3 Rust backend is unique in this corpus — all other Python-based frameworks use pure Python LLM clients.

Observable Failure Modes

Docker dependency: The most interesting capabilities (sanitizer sandbox, Kali pentest) require Docker. Environments without Docker get degraded functionality.
Legal liability surface: Dual-use tool with broad capabilities. The README warning is minimal; operators bear full legal risk.
Budget blowup on large repos: The per-file fan-out at max_parallel=15 with capable models can be extremely expensive. budget_usd parameter is the only control.
Model availability coupling: The README examples use model names like gpt-5.3-codex and gpt-5.4-mini — if these models are deprecated or renamed, pipeline configuration breaks silently.
Single maintainer: 10 contributors but Eric Hartford appears to be the sole active maintainer. No governance structure visible.

Explicit Antipatterns

Running against unauthorized targets
Using without Docker for full pipeline functionality

Workflow

Clearwing — Workflow

SourceHunt Pipeline (Primary Workflow)

Phase	What happens	Artifact
1. File ranking	LLM ranker scores all source files by vuln likelihood	ranked file list
2. Parallel fan-out	N hunter agents launched per ranked file	concurrent agent pool
3. Per-file hunting	Each hunter LLM reads file, identifies candidates	raw finding candidates
4. 4-axis validation	Adversarial verifier checks REAL/TRIGGERABLE/IMPACTFUL/GENERAL	validated findings
5. PoC development	Exploiter LLM writes proof-of-concept code	PoC code
6. PoC stability	Fresh Docker container runs, ASan/UBSan monitors	crash signal
7. Human approval gate	Human reviews exploit before execution	approval/rejection
8. Patch validation	Optional: run against patched version	validation signal
9. Reporting	SARIF / markdown / JSON report	output files

Network Scan Workflow

Phase	Artifact
Target specification	host/port/range
Service detection	service list
ReAct loop (63 tools)	vulnerability candidates
Human approval gate	exploit execution decision
Reporting	markdown/JSON report

Campaign Workflow

YAML config specifies multiple repos, shared budget, checkpoint/resume state. Each repo runs sourcehunt; aggregate reporting across all repos.

Evidence Ladder (Explicit states)

suspicion → static_corroboration → crash_reproduced → root_cause_explained → exploit_demonstrated → patch_validated

Findings are classified by highest evidence level reached.

Approval Gates

Exploit execution gate: human approval required before attempting exploit against a live target
Responsible disclosure review: clearwing disclose review — manual human review of generated disclosure reports

Budget System

Three-band budget promotion — findings with higher evidence levels get more budget allocated. budget_usd parameter controls total spend per run.

Memory Context

Clearwing — Memory and Context

Knowledge Graph

clearwing/data/ + networkx — a persistent knowledge graph that accumulates findings across scan sessions. The network scan agent writes to this graph; clearwing graph visualizes it.

ChromaDB Vector Store

chromadb>=1.0.0 in dependencies — used for mechanism memory in the sourcehunt pipeline ("mechanism memory" is referenced in README quickstart comment: "standard depth — sandboxed LLM hunters, adversarial verifier, mechanism memory, variant loop"). This enables finding deduplication and variant hypothesis generation across file batches.

Campaign Checkpoint/Resume

clearwing campaign run campaign.yaml supports checkpoint/resume — campaign state is persisted to disk so interrupted multi-repo hunts can be resumed without re-running completed repos.

Session History

clearwing history and clearwing sessions subcommands — session state is persisted between runs. clearwing/observability/ handles telemetry and logging.

SHA-3 Cryptographic Commitments

clearwing/crypto/ — SHA-3 hash commitments for responsible disclosure. Creates a cryptographically verifiable timestamped claim of finding priority before public disclosure. Persisted to disk.

Findings Pool

Shared findings pool with root-cause deduplication across parallel hunter agents. State is held in memory during a run and written to the output directory at completion.

Cross-session Handoff

Yes, via the knowledge graph and campaign checkpoint mechanism — scan state persists across sessions.

Orchestration

Clearwing — Orchestration

Multi-agent

Yes. The sourcehunt pipeline runs N parallel hunter agents (one per ranked source file), each operating independently. max_parallel=15 in the README example.

Orchestration Pattern

Hierarchical + parallel fan-out:

Ranker agent ranks files (sequential planning)
N hunter agents run in parallel per file (parallel fan-out)
Verifier agent validates each finding (sequential per finding)
Exploiter agent develops PoC (sequential per validated finding)

Multi-model Routing

Explicit, code-level, per-stage:

ranker_llm: cheap/fast (gpt-5.4-mini class)
hunter_llm: capable (gpt-5.4 class)
verifier_llm: cheap/fast (gpt-5.4-mini class)
exploiter_llm: code-specialized (gpt-5.3-codex class)

Isolation Mechanism

Docker containers for:

Kali Linux container for network pentest tools
Sanitizer sandbox (ASan/UBSan) for PoC stability testing
Fresh containers per PoC stability check run

Execution Mode

Interactive-loop (ReAct agent for network scan and interactive mode) and one-shot pipeline (sourcehunt, campaign). CI mode is one-shot.

Consensus Mechanism

4-axis adversarial validation (REAL / TRIGGERABLE / IMPACTFUL / GENERAL). Each axis is a boolean gate; a finding must pass all four to proceed to PoC development. This is not raft/byzantine consensus but it is an explicit multi-criterion adversarial filter.

Human in the Loop

Exploit execution requires human approval (safety gate in clearwing/safety/). Disclosure review is also human-gated.

Campaign Orchestration

YAML-driven campaign runs sourcehunt across dozens/hundreds of repos with shared budget, checkpoint/resume, and aggregate reporting — effectively a scheduled batch job over a repository list.

Ui Cli Surface

Clearwing — UI and CLI Surface

Dedicated CLI Binary

Yes. Binary name: clearwing. Entry point: clearwing:main. 20 subcommands.

CLI Design

Argparse-based dispatcher. Each subcommand lives in its own module under clearwing.ui.commands. Subcommand modules may declare an ALIASES tuple for alternate names.

Key subcommands for agent use:

clearwing sourcehunt <url> --depth standard
clearwing ci --config .clearwing.ci.yaml --sarif results.sarif
clearwing campaign run campaign.yaml

Textual TUI

clearwing/ui/tui/ — Python Textual-based terminal UI with screens and components. Launched implicitly or via clearwing interactive.

FastAPI Web UI

clearwing/ui/web/ — FastAPI + WebSocket-backed web dashboard. Launched via clearwing webui. Port: unknown (not specified in README). Stack: FastAPI, uvicorn, websockets. Static files served from clearwing/ui/web/static/.

CI/CD Integration

clearwing ci --config .clearwing.ci.yaml --sarif results.sarif — SARIF output compatible with GitHub Code Scanning. Non-interactive CI mode.

Setup Wizard

clearwing setup — interactive menu-driven provider selection, credential entry, optional live test, persists to ~/.clearwing/config.yaml.

clearwing doctor — verifies Python, credentials, Docker daemon, external tools, optional extras, network reachability.

MCP Server

clearwing mcp — Clearwing can expose its capabilities as an MCP server. Module at clearwing/mcp/.

Reporting

Output formats: json, markdown, sarif. Reports written to --output-dir. clearwing report subcommand for standalone report generation from stored findings.

Related frameworks

same archetype · same primary tool · same memory type

Daytona ★ 72k

A9 Sandbox substrate

Provide secure, elastic, sub-90ms sandbox compute infrastructure for running AI-generated code, accessible via multi-language…

CUA ★ 17k

A9 Sandbox substrate

Unified SDK for building, benchmarking, and deploying agents that interact with full OS GUIs via isolated VMs.

E2B ★ 12k

A9 Sandbox substrate

Run AI-generated code safely in cloud-hosted isolated sandboxes via a 3-line SDK integration.

OpenSandbox ★ 11k

A9 Sandbox substrate

Protocol-first general-purpose sandbox platform for AI applications with multi-language SDKs and pluggable isolation backends.

Microsandbox ★ 6.3k

A9 Sandbox substrate

Spawn hardware-isolated microVMs as child processes directly from application code, with no server setup, in under 100ms.

CubeSandbox ★ 5.9k

A9 Sandbox substrate

Sub-60ms KVM microVM sandboxes for AI agents with E2B drop-in compatibility and <5MB memory overhead.

Distribution

Type: standalone-repo
License: MIT
Install: clone-and-configure
Version: 1.0.0

Surfaces

CLI binary: clearwing
CLI subcmds: 20
Local UI: web-dashboard
Tech stack: FastAPI + WebSocket + Textual TUI

Components

Commands: 20
Skills: 0
Subagents: 4
Hooks: 0
MCP servers: 1
MCP tools: 63
Scripts: 0
Templates: 1

Workflow

Phases: 8
Approval gates: 2
Spec format: yaml
Spec storage: flat-files
Delta or full: whole-file

Orchestration

Multi-agent: Yes
Pattern: hierarchical
Max concurrent: 15
Isolation: container
Consensus: other
Prompt chaining: Yes

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text

Execution

Mode: interactive-loop
Crash recovery: Yes
Compaction: No
Session handoff: Yes
Streaming: Yes

Memory

Type: hybrid
Persistence: project
Search: vector
State files: 4 files

Quality

TDD: No
TDD mechanism: none
Validators: 2
Self-review: adversarial-subagent

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: Yes
Audit format: jsonl
Replay: Yes

Tools

Primary: none
Targets: 1
Portability: low

Signals

Stars: 982
Last commit: 2026-05-21
Contributors: 10
Maintainer: active
Quality score: 7/10