Skip to content
/

AgentBay SDK

agentbay-sdk · agentbay-ai/wuying-agentbay-sdk · ★ 1.1k · last commit 2026-05-19

Primitive shape
No installable primitives
00

Summary

AgentBay SDK — Summary

AgentBay SDK (Wuying AgentBay) is an Alibaba Cloud product providing on-demand cloud sandboxes for AI agents across four surface types: Browser (web automation via CDP/Playwright), Desktop (Windows/Linux GUI automation), Mobile (cloud mobile devices), and Code Space (multi-language dev environment). It ships SDKs in Python, TypeScript, Golang, and Java, backed by Alibaba Cloud's Wuying (无影) cloud desktop infrastructure. Sessions are created via API, used for agent tasks, and destroyed — disposable cloud VMs with a simple CRUD lifecycle.

AgentBay SDK is closest to e2b-desktop in this batch (both are disposable cloud sandbox APIs), but differs in provider (Alibaba vs. E2B), the explicit Mobile Use surface (cloud mobile device automation), and multi-language SDK parity (Java is added). It differs from Phase B Batch 18/33 canonical sandboxes (E2B, Daytona) in the MCP integration hooks and explicit code_latest/browser_latest/desktop_latest image taxonomy.

01

Overview

AgentBay SDK — Overview

Origin

AgentBay is an Alibaba Cloud product from the Wuying (无影) cloud desktop team. Published at arXiv (https://arxiv.org/abs/2512.04367) as an academic paper. The SDK wraps Alibaba's cloud desktop/mobile infrastructure for AI agent consumption.

Philosophy

From the README:

"AgentBay provides on-demand cloud sandboxes for AI agents — isolated environments with browser, desktop, mobile, and code execution capabilities. Create a sandbox in seconds, let your agent do its work, and tear it down when done. No infrastructure to manage."

The core value proposition is disposable, multi-surface cloud environments accessible via a unified SDK across four languages. The "bring your own LLM" model means AgentBay provides environments, not intelligence.

Four Sandbox Types

  1. Browser Use: Chromium-based browser via CDP, Playwright/Puppeteer compatible
  2. Computer Use: Cloud desktop (Windows or Linux) for enterprise application automation, legacy software
  3. Mobile Use: Cloud mobile environment for app testing and automation
  4. Code Space: Professional development environment for multi-language code execution

Key Differentiators

  • Mobile Use is unique in the sandbox space — cloud mobile device automation
  • Browser Use includes AI-powered BrowserOperator (natural language task execution)
  • Session Replay, Session Inspector, Live Mode for observability
  • Multi-language SDK parity: Python, TypeScript, Golang, Java
  • Backed by Alibaba Cloud's production Wuying infrastructure
02

Architecture

AgentBay SDK — Architecture

Distribution

  • Python: pip install wuying-agentbay-sdk
  • TypeScript: npm install wuying-agentbay-sdk
  • Golang: go get github.com/aliyun/wuying-agentbay-sdk/golang/pkg/agentbay
  • Java: Maven com.aliyun:agentbay-sdk:0.20.0

Directory Structure

python/         # Python SDK
typescript/     # TypeScript SDK
golang/         # Go SDK
java/           # Java SDK (Maven)
docs/
  guides/
    browser-use/   # Browser Use documentation
    computer-use/  # Desktop automation documentation
    mobile-use/    # Mobile automation documentation
    codespace/     # Code execution documentation
cookbook/       # Usage examples
hooks/          # SDK hooks
resource/
signatures/
scripts/

Required Runtime

  • Alibaba Cloud account
  • AGENTBAY_API_KEY env var
  • Python/Node/Go/Java runtime per chosen SDK

Image Types

  • code_latest — Code execution environment
  • browser_latest — Browser automation (Chromium)
  • desktop_latest — Desktop GUI environment

Install Complexity

One-liner per language. Multi-step for Alibaba Cloud account + API key registration.

Session Lifecycle

AgentBay.create(image_id) → Session
  ↓
Session.browser.*    (CDP/Playwright)
Session.code.*       (run_code)
Session.computer.*   (UI automation)
Session.file.*       (filesystem)
  ↓
AgentBay.delete(session)

API Architecture

  • REST API at https://agentbay-api.aliyun.com
  • SDK wraps HTTP calls
  • Session-based: one session = one sandbox VM

Browser Connectivity

  • Playwright/Puppeteer via CDP endpoint URL
  • session.browser.get_endpoint_url() → CDP websocket URL
03

Components

AgentBay SDK — Components

Core Classes

AgentBay (Python)

  • AgentBay(api_key) — Initialize client
  • agent_bay.create(CreateSessionParams(image_id=...)) — Create sandbox session
  • agent_bay.delete(session) — Destroy session

Session

  • session.browser — Browser surface (CDP access)
  • session.code — Code execution surface
  • session.computer — Desktop GUI surface
  • session.file — Filesystem surface
  • session.mobile — Mobile device surface (mobile images)
  • session.delete() — Destroy this session

Browser Surface

  • session.browser.initialize(BrowserOption()) — Init browser instance
  • session.browser.get_endpoint_url() — Get CDP WebSocket URL
  • BrowserOption: stealth mode, proxy settings, fingerprinting
  • BrowserOperator — AI-powered natural language browser tasks

Code Surface

  • session.code.run_code(code, language) — Execute code snippet
  • Returns CodeRunResult with success, result, error

Computer (Desktop) Surface

  • Application lifecycle management
  • Window management + positioning
  • Mouse and keyboard automation (UI Automation)

File Surface

  • Read/write files in sandbox filesystem

SDK Hooks (hooks/ dir)

  • Pre/post session lifecycle hooks for SDK extensions (implementation details not documented in public README)

Observability

  • Session Replay — record and replay browser sessions
  • Session Inspector — real-time debug view
  • Live Mode — real-time monitoring
  • IP proxy, stealth/fingerprinting options
05

Prompts

AgentBay SDK — Prompts

Verbatim Excerpt 1: Python Quick Start (from README)

from agentbay import AgentBay, CreateSessionParams

agent_bay = AgentBay()

# Create a cloud sandbox (options: "code_latest", "browser_latest", "desktop_latest")
session = agent_bay.create(CreateSessionParams(image_id="code_latest")).session

# Execute code in the sandbox
result = session.code.run_code("print('Hello AgentBay')", "python")
if result.success:
    print(result.result)  # Hello AgentBay

agent_bay.delete(session)

Technique: SDK API usage pattern. No pre-authored agent behavior prompts. The SDK is infrastructure; agent logic is user-defined.


Verbatim Excerpt 2: Browser Automation via CDP (from browser-use guide)

session_result = agent_bay.create(params)
session = session_result.session

# Initialize browser (supports stealth, proxy, fingerprint, etc. via BrowserOption)
ok = session.browser.initialize(BrowserOption())
endpoint_url = session.browser.get_endpoint_url()

# Connect Playwright over CDP and automate
with sync_playwright() as p:
    browser = p.chromium.connect_over_cdp(endpoint_url)
    context = browser.contexts[0]
    page = context.new_page()
    page.goto("https://www.aliyun.com")
    print("Title:", page.title())
    browser.close()

session.delete()

Technique: CDP passthrough pattern — the SDK provides the browser endpoint URL; Playwright/Puppeteer takes over for actual browser automation. This is an infrastructure bridge, not a prompting framework.


Observations

AgentBay SDK ships zero pre-authored agent behavior prompts. It is pure cloud sandbox infrastructure with multi-surface access. All agent behavior, LLM interaction, and decision logic must be implemented by the user. The BrowserOperator AI feature (natural language task execution) may contain prompts internally, but these are not in the open-source repo.

09

Uniqueness

AgentBay SDK — Uniqueness

Differs From Seeds

AgentBay SDK is most similar to e2b-desktop in this batch (both disposable cloud sandbox APIs), and loosely similar to e2b from the canonical Phase B Batch 18/33 seeds. The critical differentiators: (1) Mobile Use surface — cloud mobile device automation is not available in any seed or batch-30 framework; (2) Four language SDKs including Java — the only sandbox in this corpus with Java support; (3) Alibaba Cloud infrastructure (Wuying) — enterprise-grade but requires Alibaba Cloud account; (4) AI-powered BrowserOperator for natural-language browser task execution, similar in spirit to browser-harness but as a managed cloud service. Unlike agent-infra-sandbox (Docker container, all surfaces in one), AgentBay separates surface types as distinct image IDs.

Positioning

Alibaba Cloud's managed answer to Computer Use infrastructure — a multi-surface cloud sandbox API for enterprise AI agents, particularly strong for mobile automation and for teams already in the Alibaba Cloud ecosystem.

Observable Failure Modes

  1. Alibaba Cloud lock-in: Requires Alibaba Cloud account; no local or multi-cloud deployment
  2. No offline mode: All sandboxes are cloud-managed — latency depends on Alibaba Cloud regions
  3. Session Replay/Inspector: Feature descriptions suggest closed-source observability components not in the OSS repo
  4. Costs scale with parallel sessions: Cloud VM per session
  5. Limited English docs: Some docs appear to be translated from Chinese with gaps

What Makes It Interesting

The Mobile Use surface — cloud mobile device automation — is unique across the entire corpus. No other framework provides cloud-managed Android/iOS device automation as a first-class primitive for AI agents.

04

Workflow

AgentBay SDK — Workflow

Typical Workflow

Phase Artifact Description
Account Setup API key from AgentBay Console Register at agentbay.console.aliyun.com
Install SDK pip install wuying-agentbay-sdk Language-specific install
Create Session agent_bay.create(image_id=...) Spin up cloud sandbox
Execute Tasks session.browser/code/computer.* Agent task execution
Observe Session Replay / Inspector / Live Mode Debug and monitor
Destroy agent_bay.delete(session) Release resources

Browser Automation Workflow (Playwright via CDP)

session = agent_bay.create(CreateSessionParams(image_id="browser_latest")).session
session.browser.initialize(BrowserOption())
endpoint_url = session.browser.get_endpoint_url()

with sync_playwright() as p:
    browser = p.chromium.connect_over_cdp(endpoint_url)
    page = browser.contexts[0].new_page()
    page.goto("https://example.com")
    # agent loop...
    browser.close()

agent_bay.delete(session)

Code Execution Workflow

session = agent_bay.create(CreateSessionParams(image_id="code_latest")).session
result = session.code.run_code("print('Hello AgentBay')", "python")
agent_bay.delete(session)

Approval Gates

None — sessions are programmatic. The user's agent code controls all decisions.

Session Concurrency

Multiple sessions can exist simultaneously. Each is an independent cloud VM.

06

Memory Context

AgentBay SDK — Memory & Context

State Storage

Session filesystem is the only persistent state per sandbox VM. No built-in agent memory or context management.

Memory Type

None built-in. Session filesystem is ephemeral unless explicitly extracted.

Cross-Session Handoff

No native session persistence across creates/deletes. Users must extract needed state before delete().

Context Window Management

Not handled — delegated to the user's agent/LLM implementation.

Observability State

  • Session Replay: Records and replays browser sessions — a form of audit/replay capability
  • Session Inspector: Real-time debug state of current session
  • Live Mode: Real-time monitoring — not persistent

MCP Integration

The repo has a hooks/ directory suggesting MCP or lifecycle hook support, but details not documented in public README.

Notes

Like E2B Desktop, AgentBay is pure execution infrastructure. All memory, context, and agent state management is responsibility of the consuming agent framework.

07

Orchestration

AgentBay SDK — Orchestration

Multi-Agent Support

No built-in coordination. Multiple sessions can run concurrently as independent cloud VMs, but no orchestration protocol between them.

Orchestration Pattern

None — sessions are primitives. Orchestration is user-defined.

Isolation Mechanism

Cloud VM (Alibaba Cloud Wuying desktop/mobile infrastructure). Each session = dedicated cloud VM with isolated network, filesystem, display.

Isolation Types by Surface

Surface Isolation
Code Space Container/VM with language runtime
Browser Chromium instance in isolated VM
Desktop Full virtual desktop (Windows/Linux)
Mobile Cloud mobile device (Android/iOS)

Subagent Definition Format

Not applicable.

Multi-Model Usage

Not applicable — SDK is model-agnostic.

Execution Mode

One-shot per session. Create → use → destroy.

Startup Time

"Seconds" — cloud-managed VM allocation.

Consensus Mechanism

None.

Crash Recovery

No built-in recovery. Session Inspector/Replay for post-mortem debugging.

Streaming Output

Code execution returns results synchronously. Browser streaming via CDP (real-time protocol).

08

Ui Cli Surface

AgentBay SDK — UI / CLI Surface

CLI Binary

None in the open-source SDK. Session management is programmatic only.

Local Web UI

None. AgentBay Console at agentbay.console.aliyun.com provides cloud account management (external, not open-source).

Session Observability Tools

  • Session Replay: Record + replay browser sessions for debugging/compliance
  • Session Inspector: Real-time session state inspection
  • Live Mode: Real-time monitoring of active sessions
  • These features are part of the managed cloud service, not local tools

API Access

  • REST API at https://agentbay-api.aliyun.com
  • Python/TS/Go/Java SDK wraps HTTP calls

IDE Integration

None. SDK is language library only.

Cross-Tool Portability

High — works with any agent framework in Python, TypeScript, Golang, or Java. Not tied to any AI client or IDE. Playwright/Puppeteer compatible via CDP endpoint.

Developer Console

agentbay.console.aliyun.com — Alibaba Cloud console for API key management, service management. External managed service.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.