Skip to content
/

CUA

cua-sandbox · trycua/cua · ★ 17k · last commit 2026-05-26

Primitive shape 1 total
MCP tools 1
00

Summary

CUA (trycua/cua) — Summary

CUA is a multi-component platform for building, benchmarking, and deploying agents that use computers — combining a macOS VM hypervisor (Lume), a cross-platform sandbox SDK (cua), a computer-use MCP driver (cua-driver), a benchmarking harness (cua-bench), and a multi-agent orchestration CLI (cuabot). Its defining primitive is the macOS microVM running on Apple Silicon via Apple's Virtualization.Framework, giving near-native performance without Docker's shared-kernel security posture. The Python cua SDK presents a single unified API (Sandbox.ephemeral(Image.linux()|.macos()|.windows()|.android())) regardless of whether the sandbox runs locally via QEMU or on the cua.ai cloud. The cua-driver component provides background computer-use — clicking, typing, and screenshotting native macOS apps without stealing the cursor or focus, exposed as an MCP server over stdio for Claude Code, Cursor, and custom clients. CUA targets AI agent developers who need to evaluate, train, or deploy agents that interact with full OS GUIs rather than just executing code.

Differs from seeds: No seed in the catalog touches computer-use or GUI automation. All 11 seeds (superpowers, spec-kit, claude-flow, openspec, BMAD-METHOD, taskmaster-ai, agent-os, kiro, ccmemory, claude-conductor, spec-driver) are text-only coding agent frameworks. CUA sits at a fundamentally different layer: it provides the visual I/O sandbox (screenshot, click, type, gesture) that a computer-use agent requires, rather than injecting prompts or skills into a coding session. The cua-driver MCP server is structurally similar to ccmemory's MCP pattern (bundling capabilities as MCP tools), but the capabilities are screen-capture and UI-interaction rather than memory read/write.

01

Overview

CUA — Overview

Origin

TryCua was founded to address the gap between LLM computer-use capabilities and the infrastructure needed to run them reliably. The project grew from the realization that running macOS VMs on Apple Silicon is achievable with near-native performance via Apple's Virtualization.Framework, enabling computer-use agents to operate on actual macOS environments rather than Linux containers with X11 emulation.

Philosophy

The README tagline:

"Build, benchmark, and deploy agents that use computers"

The project philosophy is multi-target, single API: whether you're running a Linux container, a macOS VM, a Windows VM, or an Android emulator, the same Sandbox Python API works. The underlying runtime (QEMU local, cloud VM, Apple Virt.Framework) is abstracted away.

For the CUA Driver specifically:

"Drive any native macOS app in the background — agents click, type, and verify without stealing the cursor, focus, or Space, even on non-AX surfaces like Chromium web content and canvas-based tools (Blender, Figma, DAWs, game engines)."

The emphasis on background operation (not stealing cursor focus) reflects a key insight: computer-use agents deployed in production cannot disrupt the user's active session.

Components philosophy

The project is organized as a mono-repo of distinct but composable components:

  • Lume (macOS VMs): infrastructure layer — create/manage macOS and Linux VMs on Apple Silicon
  • cua (sandbox SDK): agent layer — unified API for sandbox interaction
  • cua-driver (MCP server): integration layer — connects existing agents (Claude Code, Cursor) to CUA sandboxes via MCP
  • cuabot (CLI): orchestration layer — run agents and GUI workflows in sandboxes from the command line
  • cua-bench (benchmarking): evaluation layer — OSWorld, ScreenSpot, Windows Arena, custom tasks

Trajectory recording

Every CUA Driver session records a replayable trajectory — a design choice explicitly mentioned in the README. This supports RL training data collection.

Key manifesto statements

"One API for any VM or container image — cloud or local."

"Every session records as a replayable trajectory."

"Individual windows appear natively on your desktop with H.265, shared clipboard, and audio." (cuabot)

02

Architecture

CUA — Architecture

Sandbox primitive

Two distinct primitives:

  1. macOS/Linux microVM via Apple Virtualization.Framework (Lume): near-native performance on Apple Silicon; not QEMU-based for macOS targets.
  2. QEMU VM (Linux/macOS/Windows/Android via cua SDK): cross-platform, local or cloud.
  3. Linux container (for Image.linux() on cloud): standard container.

Startup time claims: Not explicitly benchmarked in README, but "near-native performance" implied for Lume VMs. The TypeScript adapter README states cold start ~1 second for WASM-backed tasks, but ~10ms after preload.

Distribution

  • cua SDK: pip install cua (Python, requires 3.11+)
  • lume: /bin/bash -c "$(curl -fsSL .../install.sh)" (Swift binary for Apple Silicon macOS)
  • cuabot: npx cuabot (Node.js)
  • cua-driver: /bin/bash -c "$(curl -fsSL .../install.sh)" (Swift binary + optional Rust port)
  • cua-bench: uv tool install -e . within cua-bench/ subdirectory

Directory tree (repo)

cua/
├── libs/
│   ├── cua-driver/       # Background computer-use driver (Swift + optional Rust port)
│   │   ├── scripts/      # Install scripts
│   │   └── rust/         # Experimental Rust port (cua-driver-rs)
│   ├── cua-driver-fixtures/
│   ├── cuabot/           # Multi-agent computer-use sandbox CLI
│   ├── kasm/             # Kasm-based remote desktop integration
│   ├── lume/             # macOS/Linux VM management (Apple Virt.Framework)
│   │   └── Sources/      # Swift source
│   ├── lumier/           # Docker-compatible interface for Lume VMs
│   ├── python/           # Python SDK packages (cua-core, cua-agent, cua-computer, etc.)
│   │   ├── cua-agent/
│   │   ├── cua-computer/
│   │   ├── cua-computer-server/
│   │   └── cua-mcp-server/
│   ├── qemu-docker/      # QEMU via Docker wrapper
│   ├── typescript/       # TypeScript SDK
│   └── xfce/             # XFCE desktop environment setup
├── libs/cua-bench/       # Benchmarking and RL environments
├── skills/               # Claude Code skill files
├── notebooks/            # Jupyter notebooks for examples
├── pyproject.toml        # Mono-repo Python config
└── package.json          # Node.js config (cuabot)

Required runtime

  • macOS on Apple Silicon (for Lume / cua-driver native features)
  • Python 3.11+ (for cua SDK)
  • Node.js (for cuabot)
  • Docker (for QEMU-docker backend)
  • QEMU (for local VM backends)

Target AI tools

  • Claude Code (via cua-driver MCP server)
  • Cursor (via cua-driver MCP server)
  • Custom MCP clients
  • Any Python-based agent (via cua SDK directly)

Host-OS posture

Significant host access for macOS targets: the computer-use driver must interact with native apps. For Linux containers, standard container isolation applies. Apple Virtualization.Framework provides VM-level isolation for macOS guests.

Isolation mechanism

Target Isolation
macOS VM (Lume) Apple Virtualization.Framework (hardware VM)
Linux VM (QEMU) QEMU full virtualization
Linux container Container namespace
Android QEMU

Config files

  • pyproject.toml (workspace Python config)
  • package.json (Node.js config)
  • Package.swift (Lume Swift package)
03

Components

CUA — Components

Python packages (pip)

Package Description
cua Unified sandbox SDK: `Sandbox.ephemeral(Image.linux()
cua-agent AI agent framework with computer-use capabilities
cua-computer Low-level computer interaction primitives (mouse, keyboard, screenshot)
cua-computer-server Driver for UI interactions and code execution inside sandboxes
cua-mcp-server MCP server exposing CUA capabilities to MCP-capable clients
cua-bench / cb Benchmark CLI: run OSWorld, ScreenSpot, Windows Arena, custom tasks

CLI tools

Tool Install Description
lume curl install script (Swift binary) Create/manage macOS and Linux VMs on Apple Silicon via Apple Virtualization.Framework
cua-driver curl install script (Swift binary) Background computer-use driver; MCP server over stdio
cuabot npx cuabot Multi-agent computer-use sandbox CLI
cb uv tool install -e . cua-bench CLI for benchmarking

Claude Code skills (libs/skills/)

The repo ships at least one skill file for Claude Code integration (confirmed by skills/ directory in repo root). Specific skill names not exposed in README or top-level listing — checked skills/ directory exists.

MCP server (cua-driver)

The cua-driver exposes computer-use capabilities as MCP tools over stdio:

# Standard MCP registration
claude mcp add --transport stdio cua-driver -- cua-driver mcp

# Claude Code computer-use compat mode (grounding on window screenshots)
claude mcp add --transport stdio cua-computer-use -- cua-driver mcp --claude-code-computer-use-compat

MCP tools provided: screenshot (window-specific in compat mode), click, type, hover, key_press (at minimum — full tool list in libs/cua-driver/README.md)

Sandbox SDK primitives (Python)

sb.shell.run("command")          # Shell command execution
sb.screenshot()                  # Full screenshot
sb.mouse.click(x, y)            # Mouse click
sb.keyboard.type("text")         # Keyboard input
sb.mobile.gesture((x1,y1),(x2,y2))  # Multi-touch gesture

cuabot CLI commands

cuabot                           # Setup onboarding
cuabot claude                    # Run Claude Code in sandbox
cuabot openclaw                  # Run OpenClaw in sandbox
cuabot chromium                  # Run Chromium in sandbox
cuabot --screenshot              # Take screenshot
cuabot --type "text"             # Type text
cuabot --click <x> <y> [button] # Click

cua-bench CLI

cb image create linux-docker     # Create base image
cb run dataset <path> --agent <name> --max-parallel 4  # Run benchmark

Supported OS targets (Sandbox.ephemeral())

  • Image.linux() — Linux container or VM
  • Image.macos() — macOS VM (via Lume on Apple Silicon)
  • Image.windows() — Windows VM
  • Image.android() — Android (QEMU)

lumier

Docker-compatible interface for Lume VMs — allows using Docker CLI semantics to manage Lume-backed macOS/Linux VMs.

05

Prompts

CUA — Prompts

CUA is primarily an infrastructure and SDK platform. Its prompt surface is minimal but real — the skills/ directory ships Claude Code skill files.

Excerpt 1: cua-driver MCP registration instructions (from libs/cua-driver/README.md)

## Claude Code computer-use compatibility

Standard Claude Code MCP registration:

```bash
claude mcp add --transport stdio cua-driver -- cua-driver mcp

If you want Claude Code's vision/computer-use-style flow to ground on CuaDriver window screenshots, register the compatibility mode:

claude mcp add --transport stdio cua-computer-use -- cua-driver mcp --claude-code-computer-use-compat

This keeps CuaDriver's normal MCP tools and changes only screenshot, which requires pid and window_id and captures that window only.


**Prompting technique**: *Tool registration injection* — no natural language prompt, but the `--claude-code-computer-use-compat` flag changes the screenshot tool signature to match Claude Code's expected computer-use interface (`mcp__cua-computer-use__screenshot`). This is capability grounding via API contract.

## Excerpt 2: cuabot README (from main README)

```markdown
## CuaBot - Co-op computer-use for any agent

`cuabot` gives any coding agent a seamless sandbox for computer-use. Individual windows appear natively on your desktop with H.265, shared clipboard, and audio.

```bash
npx cuabot                 # Setup onboarding
# Run any agent in a sandbox
cuabot claude              # Claude Code
cuabot openclaw            # OpenClaw in the sandbox

# Run any GUI workflow in a sandbox
cuabot chromium
cuabot --screenshot
cuabot --type "hello"
cuabot --click <x> <y> [button]

Built-in support for agent-browser and agent-device (iOS, Android) out of the box.


**Prompting technique**: *Capability exposure* — the CLI exposes primitives (screenshot, click, type) that enable an agent to reason about and interact with GUIs. The prompt design responsibility is delegated to the agent framework using these primitives.

## Note

Full prompt content of the `skills/` directory files could not be fetched in this analysis pass. The directory exists (`skills/` at repo root) but individual file names were not listed. The Claude Code skill for `cua-driver` is referenced in the driver README: "Full tool reference, architecture notes, and the Claude Code skill ship with the package."
09

Uniqueness

CUA — Uniqueness & Positioning

Differs from seeds

CUA has no counterpart in the 11 seeds. Every seed is a text-only coding agent framework operating at the LLM prompt/hook layer. CUA operates at the visual I/O layer: it gives agents eyes (screenshots), hands (click/type/gesture), and isolated OS environments to act in. The closest structural parallel within the batch (not seeds) is CubeSandbox or SWE-ReX for the "isolated execution environment" concept, but neither provides computer-use (screen capture + GUI interaction) — they provide only shell command execution. CUA's cua-driver MCP server has the same integration pattern as ccmemory (bundled MCP tools), but the tool semantics are screen/GUI rather than memory/knowledge. No seed addresses RL training data collection via trajectory recording.

Positioning

CUA targets three distinct audiences:

  1. Agent developers building computer-use agents that need an SDK to provision and control VMs/containers
  2. Researchers evaluating agents on standardized benchmarks (OSWorld, ScreenSpot, Windows Arena)
  3. AI tool users wanting to run Claude Code or other coding agents inside sandboxed macOS/Linux VMs without compromising their host system

The cua.ai cloud service is the commercial component; the OSS repo provides the SDK and CLI tooling.

Key architectural bets

  1. macOS VM via Apple Virtualization.Framework — near-native performance on M-series chips; the only open-source framework in this batch running full macOS guests.
  2. Background operation (no cursor theft) — critical for production deployment where the computer-use agent cannot disrupt the user's session.
  3. Single SDK, multiple runtimes — the same Sandbox API works on Linux containers, Linux VMs, macOS VMs, Windows VMs, and Android.
  4. Trajectory-first design — recording every session as replayable trajectories enables RL training data collection as a first-class outcome.

Observable failure modes

  • Apple Silicon only for native macOS VMs (Lume) — x86 hosts cannot run macOS guests
  • Accessibility API limitations — non-AX surfaces (Chromium, Canvas-based apps) require alternative automation paths; the README calls this out explicitly
  • QEMU performance on macOS — Linux/Windows VMs on macOS via QEMU are slower than native; users expecting native Mac performance get it only with Lume+macOS images
  • cuabot H.265 codec requirement — streaming sandbox screens to the desktop requires H.265 support
  • MCP compat mode limitations--claude-code-computer-use-compat changes screenshot tool signature; tools calling the standard cua-driver screenshot name will break in compat mode

Cross-references

  • cuabot supports Claude Code and OpenClaw
  • cua-driver integrates with Claude Code and Cursor via MCP
  • cua-bench evaluates agents on OSWorld and ScreenSpot benchmarks
  • libs/kasm/ integrates with Kasm Workspaces for remote desktop scenarios
04

Workflow

CUA — Workflow

Primary use case workflows

1. Single sandbox, programmatic control (cua SDK)

async with Sandbox.ephemeral(Image.linux()) as sb:
    result = await sb.shell.run("echo hello")
    screenshot = await sb.screenshot()
    await sb.mouse.click(100, 200)
    await sb.keyboard.type("Hello from Cua!")

Phase-to-artifact map:

Phase Artifact
Sandbox.ephemeral(Image.X()) VM or container provisioned
sb.shell.run() Shell output
sb.screenshot() PNG image
sb.mouse.click() / sb.keyboard.type() UI interaction
async with exit Sandbox destroyed, resources released

2. Claude Code + cua-driver (MCP integration)

claude mcp add --transport stdio cua-driver -- cua-driver mcp
# Claude Code now has computer-use tools via MCP

Phase-to-artifact map:

Phase Artifact
claude mcp add MCP server registered in Claude Code
Agent call to screenshot tool Window screenshot returned
Agent call to click/type tool UI interaction executed
Session end Trajectory recorded

3. cuabot multi-agent orchestration

cuabot claude    # Launches Claude Code in a sandbox
cuabot openclaw  # Launches OpenClaw in a sandbox

4. Benchmarking (cua-bench)

cb image create linux-docker
cb run dataset datasets/cua-bench-basic --agent cua-agent --max-parallel 4

Phase-to-artifact map:

Phase Artifact
cb image create Base image built
cb run Benchmark results per task
Completion Trajectory export for RL training

Approval gates

None — CUA is an SDK/infrastructure layer. No human approval gates.

Cloud vs. local

Mode Mechanism
Local (QEMU) QEMU manages VM lifecycle locally
Cloud (cua.ai) API calls to cua.ai service; same SDK
macOS VM (Lume) Apple Virtualization.Framework on Apple Silicon
06

Memory Context

CUA — Memory & Context

Trajectory recording (primary memory mechanism)

CUA Driver records every session as a replayable trajectory — a sequence of actions (click, type, screenshot, observation) with timestamps. This serves two purposes:

  1. Debugging / replay: operators can review what the agent did
  2. RL training data: trajectories can be exported for training computer-use models

Sandbox ephemeral vs. persistent

  • Sandbox.ephemeral(): sandbox is destroyed when the async with block exits — no persistence
  • Persistent sandboxes: the cua.ai cloud platform likely supports persistent VMs (not fully documented in README for the open-source SDK)

VM state (Lume)

Lume manages macOS/Linux VM disk images. VMs can be paused and resumed via lume CLI, preserving the full OS state (analogous to VM snapshots). This is filesystem + memory state persistence at the VM level, not LLM context.

Session-level context

CUA has no built-in LLM context management. The agent framework using the CUA SDK owns context window management. CUA provides the observation stream (screenshots, shell output) but does not compress or summarize it.

State files

  • VM disk images (.qcow2, .iso) for persistent VMs
  • Trajectory files (format not specified in README; likely JSON or similar structured format)
  • No explicitly named state files in the Python SDK

Memory type classification

  • Primary: file-based (VM disk images, trajectory files)
  • No vector DB, SQLite, or graph DB built into the SDK
  • The cua.ai cloud service may have additional state management not visible in the OSS code
07

Orchestration

CUA — Orchestration

Multi-agent support

Yes — cuabot supports running multiple agents in parallel sandboxes. cua-bench explicitly supports --max-parallel N for running N benchmark instances concurrently.

Orchestration pattern

Parallel fan-out (for cua-bench: multiple agents on multiple benchmark tasks simultaneously). None for the base SDK (caller manages parallelism).

Execution mode

  • Sandbox.ephemeral(): one-shot (destroyed after async with block)
  • cua-driver: event-driven (MCP server responds to tool calls from Claude Code/Cursor)
  • cuabot: interactive-loop (runs agent inside sandbox until task completion)
  • cua-bench: parallel fan-out (runs multiple tasks in parallel, collects results)

Isolation mechanism

Target Isolation
macOS VM (Lume) Apple Virtualization.Framework (hardware VM, near-native performance)
Linux VM (QEMU) QEMU full virtualization
Linux container Container namespace
Cloud (cua.ai) Cloud-managed VM or container

This is the most diverse isolation portfolio in this batch: microVM (Apple Virt.Framework), full VM (QEMU), and container — all behind one SDK.

Multi-model routing

No built-in routing. The agent framework (Claude Code, custom agent) selects models. CUA provides the execution environment and tool primitives, not the model orchestration.

Cross-tool portability

High — cua-driver works as an MCP server for Claude Code, Cursor, and any MCP-capable client. The Python SDK is framework-agnostic (works with any Python agent). cuabot runs Claude Code, OpenClaw, and other agents.

Consensus mechanism

None.

Crash recovery

Not explicitly documented in README. Trajectories are recorded so replay is possible. VM restart is manual via lume CLI.

Streaming output

Yes — cuabot streams computer-use output (H.265 video, screenshots) to the native desktop. MCP tool calls return synchronous responses to the LLM client.

08

Ui Cli Surface

CUA — UI & CLI Surface

CLI tools

Multiple CLI binaries across the mono-repo:

Binary Install Description
lume curl install script VM lifecycle: lume run macos-sequoia-vanilla:latest, lume list, lume pull, lume stop
cua-driver curl install script MCP server: cua-driver mcp, cua-driver mcp --claude-code-computer-use-compat
cuabot npx cuabot Multi-agent sandbox CLI with computer-use primitives
cb (cua-bench) uv tool install -e . Benchmark runner: cb image create, cb run dataset

None are thin wrappers over claude/codex — all are purpose-built for CUA's sandbox/computer-use infrastructure.

Local UI surface

cuabot provides native desktop window mirroring:

  • Individual VM windows appear natively on the macOS desktop via H.265 video codec
  • Shared clipboard between host and sandbox
  • Audio passthrough
  • Picture-in-Picture mode (iOS app integration)

This is not a traditional web dashboard but a desktop native display of the sandbox's screen — unique in this batch.

Web dashboard

None built into the open-source repo. The cua.ai cloud service has a web UI (not part of the OSS codebase).

IDE integration

Claude Code via MCP (cua-driver):

  • Registered as an MCP server over stdio
  • Provides screenshot, click, type, hover tools to Claude Code
  • --claude-code-computer-use-compat mode aligns tool signatures with Claude Code's computer-use expectations

Notebooks

notebooks/ directory at repo root — Jupyter notebooks for examples and tutorials.

Observability

  • Trajectory recording: every session recorded as replayable trajectory
  • Benchmark metrics: cb run outputs per-task results with success/failure
  • No explicit log streaming UI in the OSS repo

iOS app

cuabot has an iOS companion app (Swift, Picture-in-Picture support) for monitoring sandbox sessions.

Related frameworks

same archetype · same primary tool · same memory type

Daytona ★ 72k

Provide secure, elastic, sub-90ms sandbox compute infrastructure for running AI-generated code, accessible via multi-language…

E2B ★ 12k

Run AI-generated code safely in cloud-hosted isolated sandboxes via a 3-line SDK integration.

OpenSandbox ★ 11k

Protocol-first general-purpose sandbox platform for AI applications with multi-language SDKs and pluggable isolation backends.

Microsandbox ★ 6.3k

Spawn hardware-isolated microVMs as child processes directly from application code, with no server setup, in under 100ms.

CubeSandbox ★ 5.9k

Sub-60ms KVM microVM sandboxes for AI agents with E2B drop-in compatibility and <5MB memory overhead.

sandcastle (mattpocock) ★ 5.1k

Container-isolated TypeScript SDK for orchestrating AI coding agents with Docker/Podman/Vercel Firecracker sandboxes and…