CUA

cua-sandbox · trycua/cua · ★ 17k · last commit 2026-05-26

Primitive shape 1 total

MCP tools 1

Summary

CUA (trycua/cua) — Summary

CUA is a multi-component platform for building, benchmarking, and deploying agents that use computers — combining a macOS VM hypervisor (Lume), a cross-platform sandbox SDK (cua), a computer-use MCP driver (cua-driver), a benchmarking harness (cua-bench), and a multi-agent orchestration CLI (cuabot). Its defining primitive is the macOS microVM running on Apple Silicon via Apple's Virtualization.Framework, giving near-native performance without Docker's shared-kernel security posture. The Python cua SDK presents a single unified API (Sandbox.ephemeral(Image.linux()|.macos()|.windows()|.android())) regardless of whether the sandbox runs locally via QEMU or on the cua.ai cloud. The cua-driver component provides background computer-use — clicking, typing, and screenshotting native macOS apps without stealing the cursor or focus, exposed as an MCP server over stdio for Claude Code, Cursor, and custom clients. CUA targets AI agent developers who need to evaluate, train, or deploy agents that interact with full OS GUIs rather than just executing code.

Differs from seeds: No seed in the catalog touches computer-use or GUI automation. All 11 seeds (superpowers, spec-kit, claude-flow, openspec, BMAD-METHOD, taskmaster-ai, agent-os, kiro, ccmemory, claude-conductor, spec-driver) are text-only coding agent frameworks. CUA sits at a fundamentally different layer: it provides the visual I/O sandbox (screenshot, click, type, gesture) that a computer-use agent requires, rather than injecting prompts or skills into a coding session. The cua-driver MCP server is structurally similar to ccmemory's MCP pattern (bundling capabilities as MCP tools), but the capabilities are screen-capture and UI-interaction rather than memory read/write.

Overview

CUA — Overview

Origin

TryCua was founded to address the gap between LLM computer-use capabilities and the infrastructure needed to run them reliably. The project grew from the realization that running macOS VMs on Apple Silicon is achievable with near-native performance via Apple's Virtualization.Framework, enabling computer-use agents to operate on actual macOS environments rather than Linux containers with X11 emulation.

Philosophy

The README tagline:

"Build, benchmark, and deploy agents that use computers"

The project philosophy is multi-target, single API: whether you're running a Linux container, a macOS VM, a Windows VM, or an Android emulator, the same Sandbox Python API works. The underlying runtime (QEMU local, cloud VM, Apple Virt.Framework) is abstracted away.

For the CUA Driver specifically:

"Drive any native macOS app in the background — agents click, type, and verify without stealing the cursor, focus, or Space, even on non-AX surfaces like Chromium web content and canvas-based tools (Blender, Figma, DAWs, game engines)."

The emphasis on background operation (not stealing cursor focus) reflects a key insight: computer-use agents deployed in production cannot disrupt the user's active session.

Components philosophy

The project is organized as a mono-repo of distinct but composable components:

Lume (macOS VMs): infrastructure layer — create/manage macOS and Linux VMs on Apple Silicon
cua (sandbox SDK): agent layer — unified API for sandbox interaction
cua-driver (MCP server): integration layer — connects existing agents (Claude Code, Cursor) to CUA sandboxes via MCP
cuabot (CLI): orchestration layer — run agents and GUI workflows in sandboxes from the command line
cua-bench (benchmarking): evaluation layer — OSWorld, ScreenSpot, Windows Arena, custom tasks

Trajectory recording

Every CUA Driver session records a replayable trajectory — a design choice explicitly mentioned in the README. This supports RL training data collection.

Key manifesto statements

"One API for any VM or container image — cloud or local."

"Every session records as a replayable trajectory."

"Individual windows appear natively on your desktop with H.265, shared clipboard, and audio." (cuabot)

Architecture

CUA — Architecture

Sandbox primitive

Two distinct primitives:

macOS/Linux microVM via Apple Virtualization.Framework (Lume): near-native performance on Apple Silicon; not QEMU-based for macOS targets.
QEMU VM (Linux/macOS/Windows/Android via cua SDK): cross-platform, local or cloud.
Linux container (for Image.linux() on cloud): standard container.

Startup time claims: Not explicitly benchmarked in README, but "near-native performance" implied for Lume VMs. The TypeScript adapter README states cold start ~1 second for WASM-backed tasks, but ~10ms after preload.

Distribution

cua SDK: pip install cua (Python, requires 3.11+)
lume: /bin/bash -c "$(curl -fsSL .../install.sh)" (Swift binary for Apple Silicon macOS)
cuabot: npx cuabot (Node.js)
cua-driver: /bin/bash -c "$(curl -fsSL .../install.sh)" (Swift binary + optional Rust port)
cua-bench: uv tool install -e . within cua-bench/ subdirectory

Directory tree (repo)

cua/
├── libs/
│   ├── cua-driver/       # Background computer-use driver (Swift + optional Rust port)
│   │   ├── scripts/      # Install scripts
│   │   └── rust/         # Experimental Rust port (cua-driver-rs)
│   ├── cua-driver-fixtures/
│   ├── cuabot/           # Multi-agent computer-use sandbox CLI
│   ├── kasm/             # Kasm-based remote desktop integration
│   ├── lume/             # macOS/Linux VM management (Apple Virt.Framework)
│   │   └── Sources/      # Swift source
│   ├── lumier/           # Docker-compatible interface for Lume VMs
│   ├── python/           # Python SDK packages (cua-core, cua-agent, cua-computer, etc.)
│   │   ├── cua-agent/
│   │   ├── cua-computer/
│   │   ├── cua-computer-server/
│   │   └── cua-mcp-server/
│   ├── qemu-docker/      # QEMU via Docker wrapper
│   ├── typescript/       # TypeScript SDK
│   └── xfce/             # XFCE desktop environment setup
├── libs/cua-bench/       # Benchmarking and RL environments
├── skills/               # Claude Code skill files
├── notebooks/            # Jupyter notebooks for examples
├── pyproject.toml        # Mono-repo Python config
└── package.json          # Node.js config (cuabot)

Required runtime

macOS on Apple Silicon (for Lume / cua-driver native features)
Python 3.11+ (for cua SDK)
Node.js (for cuabot)
Docker (for QEMU-docker backend)
QEMU (for local VM backends)

Target AI tools

Claude Code (via cua-driver MCP server)
Cursor (via cua-driver MCP server)
Custom MCP clients
Any Python-based agent (via cua SDK directly)

Host-OS posture

Significant host access for macOS targets: the computer-use driver must interact with native apps. For Linux containers, standard container isolation applies. Apple Virtualization.Framework provides VM-level isolation for macOS guests.

Isolation mechanism

Target	Isolation
macOS VM (Lume)	Apple Virtualization.Framework (hardware VM)
Linux VM (QEMU)	QEMU full virtualization
Linux container	Container namespace
Android	QEMU

Config files

pyproject.toml (workspace Python config)
package.json (Node.js config)
Package.swift (Lume Swift package)

Components

CUA — Components

Python packages (pip)

Package	Description
`cua`	Unified sandbox SDK: `Sandbox.ephemeral(Image.linux()
`cua-agent`	AI agent framework with computer-use capabilities
`cua-computer`	Low-level computer interaction primitives (mouse, keyboard, screenshot)
`cua-computer-server`	Driver for UI interactions and code execution inside sandboxes
`cua-mcp-server`	MCP server exposing CUA capabilities to MCP-capable clients
`cua-bench` / `cb`	Benchmark CLI: run OSWorld, ScreenSpot, Windows Arena, custom tasks

CLI tools

Tool	Install	Description
`lume`	curl install script (Swift binary)	Create/manage macOS and Linux VMs on Apple Silicon via Apple Virtualization.Framework
`cua-driver`	curl install script (Swift binary)	Background computer-use driver; MCP server over stdio
`cuabot`	`npx cuabot`	Multi-agent computer-use sandbox CLI
`cb`	`uv tool install -e .`	cua-bench CLI for benchmarking

Claude Code skills (libs/skills/)

The repo ships at least one skill file for Claude Code integration (confirmed by skills/ directory in repo root). Specific skill names not exposed in README or top-level listing — checked skills/ directory exists.

MCP server (cua-driver)

The cua-driver exposes computer-use capabilities as MCP tools over stdio:

# Standard MCP registration
claude mcp add --transport stdio cua-driver -- cua-driver mcp

# Claude Code computer-use compat mode (grounding on window screenshots)
claude mcp add --transport stdio cua-computer-use -- cua-driver mcp --claude-code-computer-use-compat

MCP tools provided: screenshot (window-specific in compat mode), click, type, hover, key_press (at minimum — full tool list in libs/cua-driver/README.md)

Sandbox SDK primitives (Python)

sb.shell.run("command")          # Shell command execution
sb.screenshot()                  # Full screenshot
sb.mouse.click(x, y)            # Mouse click
sb.keyboard.type("text")         # Keyboard input
sb.mobile.gesture((x1,y1),(x2,y2))  # Multi-touch gesture

cuabot CLI commands

cuabot                           # Setup onboarding
cuabot claude                    # Run Claude Code in sandbox
cuabot openclaw                  # Run OpenClaw in sandbox
cuabot chromium                  # Run Chromium in sandbox
cuabot --screenshot              # Take screenshot
cuabot --type "text"             # Type text
cuabot --click <x> <y> [button] # Click

cua-bench CLI

cb image create linux-docker     # Create base image
cb run dataset <path> --agent <name> --max-parallel 4  # Run benchmark

Supported OS targets (Sandbox.ephemeral())

Image.linux() — Linux container or VM
Image.macos() — macOS VM (via Lume on Apple Silicon)
Image.windows() — Windows VM
Image.android() — Android (QEMU)

lumier

Docker-compatible interface for Lume VMs — allows using Docker CLI semantics to manage Lume-backed macOS/Linux VMs.

Prompts

CUA — Prompts

CUA is primarily an infrastructure and SDK platform. Its prompt surface is minimal but real — the skills/ directory ships Claude Code skill files.

Excerpt 1: cua-driver MCP registration instructions (from libs/cua-driver/README.md)

## Claude Code computer-use compatibility

Standard Claude Code MCP registration:

```bash
claude mcp add --transport stdio cua-driver -- cua-driver mcp

If you want Claude Code's vision/computer-use-style flow to ground on CuaDriver window screenshots, register the compatibility mode:

claude mcp add --transport stdio cua-computer-use -- cua-driver mcp --claude-code-computer-use-compat

This keeps CuaDriver's normal MCP tools and changes only screenshot, which requires pid and window_id and captures that window only.


**Prompting technique**: *Tool registration injection* — no natural language prompt, but the `--claude-code-computer-use-compat` flag changes the screenshot tool signature to match Claude Code's expected computer-use interface (`mcp__cua-computer-use__screenshot`). This is capability grounding via API contract.

## Excerpt 2: cuabot README (from main README)

```markdown
## CuaBot - Co-op computer-use for any agent

`cuabot` gives any coding agent a seamless sandbox for computer-use. Individual windows appear natively on your desktop with H.265, shared clipboard, and audio.

```bash
npx cuabot                 # Setup onboarding

# Run any agent in a sandbox
cuabot claude              # Claude Code
cuabot openclaw            # OpenClaw in the sandbox

# Run any GUI workflow in a sandbox
cuabot chromium
cuabot --screenshot
cuabot --type "hello"
cuabot --click <x> <y> [button]

Built-in support for agent-browser and agent-device (iOS, Android) out of the box.


**Prompting technique**: *Capability exposure* — the CLI exposes primitives (screenshot, click, type) that enable an agent to reason about and interact with GUIs. The prompt design responsibility is delegated to the agent framework using these primitives.

## Note

Full prompt content of the `skills/` directory files could not be fetched in this analysis pass. The directory exists (`skills/` at repo root) but individual file names were not listed. The Claude Code skill for `cua-driver` is referenced in the driver README: "Full tool reference, architecture notes, and the Claude Code skill ship with the package."

Uniqueness

CUA — Uniqueness & Positioning

Differs from seeds

CUA has no counterpart in the 11 seeds. Every seed is a text-only coding agent framework operating at the LLM prompt/hook layer. CUA operates at the visual I/O layer: it gives agents eyes (screenshots), hands (click/type/gesture), and isolated OS environments to act in. The closest structural parallel within the batch (not seeds) is CubeSandbox or SWE-ReX for the "isolated execution environment" concept, but neither provides computer-use (screen capture + GUI interaction) — they provide only shell command execution. CUA's cua-driver MCP server has the same integration pattern as ccmemory (bundled MCP tools), but the tool semantics are screen/GUI rather than memory/knowledge. No seed addresses RL training data collection via trajectory recording.

Positioning

CUA targets three distinct audiences:

Agent developers building computer-use agents that need an SDK to provision and control VMs/containers
Researchers evaluating agents on standardized benchmarks (OSWorld, ScreenSpot, Windows Arena)
AI tool users wanting to run Claude Code or other coding agents inside sandboxed macOS/Linux VMs without compromising their host system

The cua.ai cloud service is the commercial component; the OSS repo provides the SDK and CLI tooling.

Key architectural bets

macOS VM via Apple Virtualization.Framework — near-native performance on M-series chips; the only open-source framework in this batch running full macOS guests.
Background operation (no cursor theft) — critical for production deployment where the computer-use agent cannot disrupt the user's session.
Single SDK, multiple runtimes — the same Sandbox API works on Linux containers, Linux VMs, macOS VMs, Windows VMs, and Android.
Trajectory-first design — recording every session as replayable trajectories enables RL training data collection as a first-class outcome.

Observable failure modes

Apple Silicon only for native macOS VMs (Lume) — x86 hosts cannot run macOS guests
Accessibility API limitations — non-AX surfaces (Chromium, Canvas-based apps) require alternative automation paths; the README calls this out explicitly
QEMU performance on macOS — Linux/Windows VMs on macOS via QEMU are slower than native; users expecting native Mac performance get it only with Lume+macOS images
cuabot H.265 codec requirement — streaming sandbox screens to the desktop requires H.265 support
MCP compat mode limitations — --claude-code-computer-use-compat changes screenshot tool signature; tools calling the standard cua-driver screenshot name will break in compat mode

Cross-references

cuabot supports Claude Code and OpenClaw
cua-driver integrates with Claude Code and Cursor via MCP
cua-bench evaluates agents on OSWorld and ScreenSpot benchmarks
libs/kasm/ integrates with Kasm Workspaces for remote desktop scenarios

Workflow

CUA — Workflow

Primary use case workflows

1. Single sandbox, programmatic control (cua SDK)

async with Sandbox.ephemeral(Image.linux()) as sb:
    result = await sb.shell.run("echo hello")
    screenshot = await sb.screenshot()
    await sb.mouse.click(100, 200)
    await sb.keyboard.type("Hello from Cua!")

Phase-to-artifact map:

Phase	Artifact
`Sandbox.ephemeral(Image.X())`	VM or container provisioned
`sb.shell.run()`	Shell output
`sb.screenshot()`	PNG image
`sb.mouse.click()` / `sb.keyboard.type()`	UI interaction
`async with` exit	Sandbox destroyed, resources released

2. Claude Code + cua-driver (MCP integration)

claude mcp add --transport stdio cua-driver -- cua-driver mcp
# Claude Code now has computer-use tools via MCP

Phase-to-artifact map:

Phase	Artifact
`claude mcp add`	MCP server registered in Claude Code
Agent call to screenshot tool	Window screenshot returned
Agent call to click/type tool	UI interaction executed
Session end	Trajectory recorded

3. cuabot multi-agent orchestration

cuabot claude    # Launches Claude Code in a sandbox
cuabot openclaw  # Launches OpenClaw in a sandbox

4. Benchmarking (cua-bench)

cb image create linux-docker
cb run dataset datasets/cua-bench-basic --agent cua-agent --max-parallel 4

Phase-to-artifact map:

Phase	Artifact
`cb image create`	Base image built
`cb run`	Benchmark results per task
Completion	Trajectory export for RL training

Approval gates

None — CUA is an SDK/infrastructure layer. No human approval gates.

Cloud vs. local

Mode	Mechanism
Local (QEMU)	QEMU manages VM lifecycle locally
Cloud (cua.ai)	API calls to cua.ai service; same SDK
macOS VM (Lume)	Apple Virtualization.Framework on Apple Silicon

Memory Context

CUA — Memory & Context

Trajectory recording (primary memory mechanism)

CUA Driver records every session as a replayable trajectory — a sequence of actions (click, type, screenshot, observation) with timestamps. This serves two purposes:

Debugging / replay: operators can review what the agent did
RL training data: trajectories can be exported for training computer-use models

Sandbox ephemeral vs. persistent

Sandbox.ephemeral(): sandbox is destroyed when the async with block exits — no persistence
Persistent sandboxes: the cua.ai cloud platform likely supports persistent VMs (not fully documented in README for the open-source SDK)

VM state (Lume)

Lume manages macOS/Linux VM disk images. VMs can be paused and resumed via lume CLI, preserving the full OS state (analogous to VM snapshots). This is filesystem + memory state persistence at the VM level, not LLM context.

Session-level context

CUA has no built-in LLM context management. The agent framework using the CUA SDK owns context window management. CUA provides the observation stream (screenshots, shell output) but does not compress or summarize it.

State files

VM disk images (.qcow2, .iso) for persistent VMs
Trajectory files (format not specified in README; likely JSON or similar structured format)
No explicitly named state files in the Python SDK

Memory type classification

Primary: file-based (VM disk images, trajectory files)
No vector DB, SQLite, or graph DB built into the SDK
The cua.ai cloud service may have additional state management not visible in the OSS code

Orchestration

CUA — Orchestration

Multi-agent support

Yes — cuabot supports running multiple agents in parallel sandboxes. cua-bench explicitly supports --max-parallel N for running N benchmark instances concurrently.

Orchestration pattern

Parallel fan-out (for cua-bench: multiple agents on multiple benchmark tasks simultaneously). None for the base SDK (caller manages parallelism).

Execution mode

Sandbox.ephemeral(): one-shot (destroyed after async with block)
cua-driver: event-driven (MCP server responds to tool calls from Claude Code/Cursor)
cuabot: interactive-loop (runs agent inside sandbox until task completion)
cua-bench: parallel fan-out (runs multiple tasks in parallel, collects results)

Isolation mechanism

Target	Isolation
macOS VM (Lume)	Apple Virtualization.Framework (hardware VM, near-native performance)
Linux VM (QEMU)	QEMU full virtualization
Linux container	Container namespace
Cloud (cua.ai)	Cloud-managed VM or container

This is the most diverse isolation portfolio in this batch: microVM (Apple Virt.Framework), full VM (QEMU), and container — all behind one SDK.

Multi-model routing

No built-in routing. The agent framework (Claude Code, custom agent) selects models. CUA provides the execution environment and tool primitives, not the model orchestration.

Cross-tool portability

High — cua-driver works as an MCP server for Claude Code, Cursor, and any MCP-capable client. The Python SDK is framework-agnostic (works with any Python agent). cuabot runs Claude Code, OpenClaw, and other agents.

Consensus mechanism

None.

Crash recovery

Not explicitly documented in README. Trajectories are recorded so replay is possible. VM restart is manual via lume CLI.

Streaming output

Yes — cuabot streams computer-use output (H.265 video, screenshots) to the native desktop. MCP tool calls return synchronous responses to the LLM client.

Ui Cli Surface

CUA — UI & CLI Surface

CLI tools

Multiple CLI binaries across the mono-repo:

Binary	Install	Description
`lume`	curl install script	VM lifecycle: `lume run macos-sequoia-vanilla:latest`, `lume list`, `lume pull`, `lume stop`
`cua-driver`	curl install script	MCP server: `cua-driver mcp`, `cua-driver mcp --claude-code-computer-use-compat`
`cuabot`	`npx cuabot`	Multi-agent sandbox CLI with computer-use primitives
`cb` (cua-bench)	`uv tool install -e .`	Benchmark runner: `cb image create`, `cb run dataset`

None are thin wrappers over claude/codex — all are purpose-built for CUA's sandbox/computer-use infrastructure.

Local UI surface

cuabot provides native desktop window mirroring:

Individual VM windows appear natively on the macOS desktop via H.265 video codec
Shared clipboard between host and sandbox
Audio passthrough
Picture-in-Picture mode (iOS app integration)

This is not a traditional web dashboard but a desktop native display of the sandbox's screen — unique in this batch.

Web dashboard

None built into the open-source repo. The cua.ai cloud service has a web UI (not part of the OSS codebase).

IDE integration

Claude Code via MCP (cua-driver):

Registered as an MCP server over stdio
Provides screenshot, click, type, hover tools to Claude Code
--claude-code-computer-use-compat mode aligns tool signatures with Claude Code's computer-use expectations

Notebooks

notebooks/ directory at repo root — Jupyter notebooks for examples and tutorials.

Observability

Trajectory recording: every session recorded as replayable trajectory
Benchmark metrics: cb run outputs per-task results with success/failure
No explicit log streaming UI in the OSS repo

iOS app

cuabot has an iOS companion app (Swift, Picture-in-Picture support) for monitoring sandbox sessions.

Related frameworks

same archetype · same primary tool · same memory type

Daytona ★ 72k

A9 Sandbox substrate

Provide secure, elastic, sub-90ms sandbox compute infrastructure for running AI-generated code, accessible via multi-language…

E2B ★ 12k

A9 Sandbox substrate

Run AI-generated code safely in cloud-hosted isolated sandboxes via a 3-line SDK integration.

OpenSandbox ★ 11k

A9 Sandbox substrate

Protocol-first general-purpose sandbox platform for AI applications with multi-language SDKs and pluggable isolation backends.

Microsandbox ★ 6.3k

A9 Sandbox substrate

Spawn hardware-isolated microVMs as child processes directly from application code, with no server setup, in under 100ms.

CubeSandbox ★ 5.9k

A9 Sandbox substrate

Sub-60ms KVM microVM sandboxes for AI agents with E2B drop-in compatibility and <5MB memory overhead.

sandcastle (mattpocock) ★ 5.1k

A9 Sandbox substrate

Container-isolated TypeScript SDK for orchestrating AI coding agents with Docker/Podman/Vercel Firecracker sandboxes and…

Distribution

Type: standalone-repo
License: MIT
Install: multi-step

Surfaces

CLI binary: lume / cua-driver / cuabot / cb
Local UI: other
Tech stack: Native macOS window mirroring via H.265 codec (cuabot); iOS app (SwiftUI)

Components

Commands: 0
Skills: 0
Subagents: 0
Hooks: 0
MCP servers: 1
Scripts: 4
Templates: 0

Workflow

Phases: 6
Approval gates: 0
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: Yes
Pattern: parallel-fan-out
Isolation: microvm
Consensus: none
Prompt chaining: No

Multi-model

Multi-model: No
BYOK: Yes
Modal: text+vision

Execution

Mode: one-shot
Crash recovery: No
Compaction: No
Session handoff: No
Streaming: Yes

Memory

Type: file-based
Persistence: session
Search: none
State files: 2 files

Quality

TDD: No
TDD mechanism: none
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: Yes
Audit format: proprietary
Replay: Yes

Tools

Primary: Claude Code
Targets: 5
Portability: high

Signals

Stars: 17k
Last commit: 2026-05-26
Contributors: 30
Maintainer: active
Quality score: 3.2/10