AgentBay SDK

agentbay-sdk · agentbay-ai/wuying-agentbay-sdk · ★ 1.1k · last commit 2026-05-19

Primitive shape

No installable primitives

Summary

AgentBay SDK — Summary

AgentBay SDK (Wuying AgentBay) is an Alibaba Cloud product providing on-demand cloud sandboxes for AI agents across four surface types: Browser (web automation via CDP/Playwright), Desktop (Windows/Linux GUI automation), Mobile (cloud mobile devices), and Code Space (multi-language dev environment). It ships SDKs in Python, TypeScript, Golang, and Java, backed by Alibaba Cloud's Wuying (无影) cloud desktop infrastructure. Sessions are created via API, used for agent tasks, and destroyed — disposable cloud VMs with a simple CRUD lifecycle.

AgentBay SDK is closest to e2b-desktop in this batch (both are disposable cloud sandbox APIs), but differs in provider (Alibaba vs. E2B), the explicit Mobile Use surface (cloud mobile device automation), and multi-language SDK parity (Java is added). It differs from Phase B Batch 18/33 canonical sandboxes (E2B, Daytona) in the MCP integration hooks and explicit code_latest/browser_latest/desktop_latest image taxonomy.

Overview

AgentBay SDK — Overview

Origin

AgentBay is an Alibaba Cloud product from the Wuying (无影) cloud desktop team. Published at arXiv (https://arxiv.org/abs/2512.04367) as an academic paper. The SDK wraps Alibaba's cloud desktop/mobile infrastructure for AI agent consumption.

Philosophy

From the README:

"AgentBay provides on-demand cloud sandboxes for AI agents — isolated environments with browser, desktop, mobile, and code execution capabilities. Create a sandbox in seconds, let your agent do its work, and tear it down when done. No infrastructure to manage."

The core value proposition is disposable, multi-surface cloud environments accessible via a unified SDK across four languages. The "bring your own LLM" model means AgentBay provides environments, not intelligence.

Four Sandbox Types

Browser Use: Chromium-based browser via CDP, Playwright/Puppeteer compatible
Computer Use: Cloud desktop (Windows or Linux) for enterprise application automation, legacy software
Mobile Use: Cloud mobile environment for app testing and automation
Code Space: Professional development environment for multi-language code execution

Key Differentiators

Mobile Use is unique in the sandbox space — cloud mobile device automation
Browser Use includes AI-powered BrowserOperator (natural language task execution)
Session Replay, Session Inspector, Live Mode for observability
Multi-language SDK parity: Python, TypeScript, Golang, Java
Backed by Alibaba Cloud's production Wuying infrastructure

Architecture

AgentBay SDK — Architecture

Distribution

Python: pip install wuying-agentbay-sdk
TypeScript: npm install wuying-agentbay-sdk
Golang: go get github.com/aliyun/wuying-agentbay-sdk/golang/pkg/agentbay
Java: Maven com.aliyun:agentbay-sdk:0.20.0

Directory Structure

python/         # Python SDK
typescript/     # TypeScript SDK
golang/         # Go SDK
java/           # Java SDK (Maven)
docs/
  guides/
    browser-use/   # Browser Use documentation
    computer-use/  # Desktop automation documentation
    mobile-use/    # Mobile automation documentation
    codespace/     # Code execution documentation
cookbook/       # Usage examples
hooks/          # SDK hooks
resource/
signatures/
scripts/

Required Runtime

Alibaba Cloud account
AGENTBAY_API_KEY env var
Python/Node/Go/Java runtime per chosen SDK

Image Types

code_latest — Code execution environment
browser_latest — Browser automation (Chromium)
desktop_latest — Desktop GUI environment

Install Complexity

One-liner per language. Multi-step for Alibaba Cloud account + API key registration.

Session Lifecycle

AgentBay.create(image_id) → Session
  ↓
Session.browser.*    (CDP/Playwright)
Session.code.*       (run_code)
Session.computer.*   (UI automation)
Session.file.*       (filesystem)
  ↓
AgentBay.delete(session)

API Architecture

REST API at https://agentbay-api.aliyun.com
SDK wraps HTTP calls
Session-based: one session = one sandbox VM

Browser Connectivity

Playwright/Puppeteer via CDP endpoint URL
session.browser.get_endpoint_url() → CDP websocket URL

Components

AgentBay SDK — Components

Core Classes

AgentBay (Python)

AgentBay(api_key) — Initialize client
agent_bay.create(CreateSessionParams(image_id=...)) — Create sandbox session
agent_bay.delete(session) — Destroy session

Session

session.browser — Browser surface (CDP access)
session.code — Code execution surface
session.computer — Desktop GUI surface
session.file — Filesystem surface
session.mobile — Mobile device surface (mobile images)
session.delete() — Destroy this session

Browser Surface

session.browser.initialize(BrowserOption()) — Init browser instance
session.browser.get_endpoint_url() — Get CDP WebSocket URL
BrowserOption: stealth mode, proxy settings, fingerprinting
BrowserOperator — AI-powered natural language browser tasks

Code Surface

session.code.run_code(code, language) — Execute code snippet
Returns CodeRunResult with success, result, error

Computer (Desktop) Surface

Application lifecycle management
Window management + positioning
Mouse and keyboard automation (UI Automation)

File Surface

Read/write files in sandbox filesystem

SDK Hooks (`hooks/` dir)

Pre/post session lifecycle hooks for SDK extensions (implementation details not documented in public README)

Observability

Session Replay — record and replay browser sessions
Session Inspector — real-time debug view
Live Mode — real-time monitoring
IP proxy, stealth/fingerprinting options

Prompts

AgentBay SDK — Prompts

Verbatim Excerpt 1: Python Quick Start (from README)

from agentbay import AgentBay, CreateSessionParams

agent_bay = AgentBay()

# Create a cloud sandbox (options: "code_latest", "browser_latest", "desktop_latest")
session = agent_bay.create(CreateSessionParams(image_id="code_latest")).session

# Execute code in the sandbox
result = session.code.run_code("print('Hello AgentBay')", "python")
if result.success:
    print(result.result)  # Hello AgentBay

agent_bay.delete(session)

Technique: SDK API usage pattern. No pre-authored agent behavior prompts. The SDK is infrastructure; agent logic is user-defined.

Verbatim Excerpt 2: Browser Automation via CDP (from browser-use guide)

session_result = agent_bay.create(params)
session = session_result.session

# Initialize browser (supports stealth, proxy, fingerprint, etc. via BrowserOption)
ok = session.browser.initialize(BrowserOption())
endpoint_url = session.browser.get_endpoint_url()

# Connect Playwright over CDP and automate
with sync_playwright() as p:
    browser = p.chromium.connect_over_cdp(endpoint_url)
    context = browser.contexts[0]
    page = context.new_page()
    page.goto("https://www.aliyun.com")
    print("Title:", page.title())
    browser.close()

session.delete()

Technique: CDP passthrough pattern — the SDK provides the browser endpoint URL; Playwright/Puppeteer takes over for actual browser automation. This is an infrastructure bridge, not a prompting framework.

Observations

AgentBay SDK ships zero pre-authored agent behavior prompts. It is pure cloud sandbox infrastructure with multi-surface access. All agent behavior, LLM interaction, and decision logic must be implemented by the user. The BrowserOperator AI feature (natural language task execution) may contain prompts internally, but these are not in the open-source repo.

Uniqueness

AgentBay SDK — Uniqueness

Differs From Seeds

AgentBay SDK is most similar to e2b-desktop in this batch (both disposable cloud sandbox APIs), and loosely similar to e2b from the canonical Phase B Batch 18/33 seeds. The critical differentiators: (1) Mobile Use surface — cloud mobile device automation is not available in any seed or batch-30 framework; (2) Four language SDKs including Java — the only sandbox in this corpus with Java support; (3) Alibaba Cloud infrastructure (Wuying) — enterprise-grade but requires Alibaba Cloud account; (4) AI-powered BrowserOperator for natural-language browser task execution, similar in spirit to browser-harness but as a managed cloud service. Unlike agent-infra-sandbox (Docker container, all surfaces in one), AgentBay separates surface types as distinct image IDs.

Positioning

Alibaba Cloud's managed answer to Computer Use infrastructure — a multi-surface cloud sandbox API for enterprise AI agents, particularly strong for mobile automation and for teams already in the Alibaba Cloud ecosystem.

Observable Failure Modes

Alibaba Cloud lock-in: Requires Alibaba Cloud account; no local or multi-cloud deployment
No offline mode: All sandboxes are cloud-managed — latency depends on Alibaba Cloud regions
Session Replay/Inspector: Feature descriptions suggest closed-source observability components not in the OSS repo
Costs scale with parallel sessions: Cloud VM per session
Limited English docs: Some docs appear to be translated from Chinese with gaps

What Makes It Interesting

The Mobile Use surface — cloud mobile device automation — is unique across the entire corpus. No other framework provides cloud-managed Android/iOS device automation as a first-class primitive for AI agents.

Workflow

AgentBay SDK — Workflow

Typical Workflow

Phase	Artifact	Description
Account Setup	API key from AgentBay Console	Register at agentbay.console.aliyun.com
Install SDK	`pip install wuying-agentbay-sdk`	Language-specific install
Create Session	`agent_bay.create(image_id=...)`	Spin up cloud sandbox
Execute Tasks	`session.browser/code/computer.*`	Agent task execution
Observe	Session Replay / Inspector / Live Mode	Debug and monitor
Destroy	`agent_bay.delete(session)`	Release resources

Browser Automation Workflow (Playwright via CDP)

session = agent_bay.create(CreateSessionParams(image_id="browser_latest")).session
session.browser.initialize(BrowserOption())
endpoint_url = session.browser.get_endpoint_url()

with sync_playwright() as p:
    browser = p.chromium.connect_over_cdp(endpoint_url)
    page = browser.contexts[0].new_page()
    page.goto("https://example.com")
    # agent loop...
    browser.close()

agent_bay.delete(session)

Code Execution Workflow

session = agent_bay.create(CreateSessionParams(image_id="code_latest")).session
result = session.code.run_code("print('Hello AgentBay')", "python")
agent_bay.delete(session)

Approval Gates

None — sessions are programmatic. The user's agent code controls all decisions.

Session Concurrency

Multiple sessions can exist simultaneously. Each is an independent cloud VM.

Memory Context

AgentBay SDK — Memory & Context

State Storage

Session filesystem is the only persistent state per sandbox VM. No built-in agent memory or context management.

Memory Type

None built-in. Session filesystem is ephemeral unless explicitly extracted.

Cross-Session Handoff

No native session persistence across creates/deletes. Users must extract needed state before delete().

Context Window Management

Not handled — delegated to the user's agent/LLM implementation.

Observability State

Session Replay: Records and replays browser sessions — a form of audit/replay capability
Session Inspector: Real-time debug state of current session
Live Mode: Real-time monitoring — not persistent

MCP Integration

The repo has a hooks/ directory suggesting MCP or lifecycle hook support, but details not documented in public README.

Notes

Like E2B Desktop, AgentBay is pure execution infrastructure. All memory, context, and agent state management is responsibility of the consuming agent framework.

Orchestration

AgentBay SDK — Orchestration

Multi-Agent Support

No built-in coordination. Multiple sessions can run concurrently as independent cloud VMs, but no orchestration protocol between them.

Orchestration Pattern

None — sessions are primitives. Orchestration is user-defined.

Isolation Mechanism

Cloud VM (Alibaba Cloud Wuying desktop/mobile infrastructure). Each session = dedicated cloud VM with isolated network, filesystem, display.

Isolation Types by Surface

Surface	Isolation
Code Space	Container/VM with language runtime
Browser	Chromium instance in isolated VM
Desktop	Full virtual desktop (Windows/Linux)
Mobile	Cloud mobile device (Android/iOS)

Subagent Definition Format

Not applicable.

Multi-Model Usage

Not applicable — SDK is model-agnostic.

Execution Mode

One-shot per session. Create → use → destroy.

Startup Time

"Seconds" — cloud-managed VM allocation.

Consensus Mechanism

None.

Crash Recovery

No built-in recovery. Session Inspector/Replay for post-mortem debugging.

Streaming Output

Code execution returns results synchronously. Browser streaming via CDP (real-time protocol).

Ui Cli Surface

AgentBay SDK — UI / CLI Surface

CLI Binary

None in the open-source SDK. Session management is programmatic only.

Local Web UI

None. AgentBay Console at agentbay.console.aliyun.com provides cloud account management (external, not open-source).

Session Observability Tools

Session Replay: Record + replay browser sessions for debugging/compliance
Session Inspector: Real-time session state inspection
Live Mode: Real-time monitoring of active sessions
These features are part of the managed cloud service, not local tools

API Access

REST API at https://agentbay-api.aliyun.com
Python/TS/Go/Java SDK wraps HTTP calls

IDE Integration

None. SDK is language library only.

Cross-Tool Portability

High — works with any agent framework in Python, TypeScript, Golang, or Java. Not tied to any AI client or IDE. Playwright/Puppeteer compatible via CDP endpoint.

Developer Console

agentbay.console.aliyun.com — Alibaba Cloud console for API key management, service management. External managed service.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

A8 Cross-runtime harness

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A8 Cross-runtime harness

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

A8 Cross-runtime harness

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

A8 Cross-runtime harness

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

A8 Cross-runtime harness

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

A8 Cross-runtime harness

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.

Distribution

Type: standalone-repo
License: Apache-2.0
Install: multi-step
Version: v0.20.0

Surfaces

CLI binary: No
CLI subcmds: 0
Local UI: No
Tech stack: null

Components

Commands: 0
Skills: 0
Subagents: 0
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 0
Templates: 0

Workflow

Phases: 6
Approval gates: 0
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: No
Pattern: none
Isolation: sandbox-api
Consensus: none
Prompt chaining: No

Multi-model

Multi-model: No
BYOK: Yes
Modal: text+vision

Execution

Mode: one-shot
Crash recovery: No
Compaction: No
Session handoff: No
Streaming: Yes

Memory

Type: none
Persistence: none
Search: none

Quality

TDD: No
TDD mechanism: none
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: Yes
Audit format: proprietary
Replay: Yes

Tools

Primary: any
Targets: 6
Portability: high

Signals

Stars: 1.1k
Last commit: 2026-05-19
Maintainer: active
Quality score: 3.2/10

Summary

AgentBay SDK — Summary

Overview

AgentBay SDK — Overview

Origin

Philosophy

Four Sandbox Types

Key Differentiators

Architecture

AgentBay SDK — Architecture

Distribution

Directory Structure

Required Runtime

Image Types

Install Complexity

Session Lifecycle

API Architecture

Browser Connectivity

Components

AgentBay SDK — Components

Core Classes

AgentBay (Python)

Session

Browser Surface

Code Surface

Computer (Desktop) Surface

File Surface

SDK Hooks (hooks/ dir)

Observability

Prompts

AgentBay SDK — Prompts

Verbatim Excerpt 1: Python Quick Start (from README)

Verbatim Excerpt 2: Browser Automation via CDP (from browser-use guide)

Observations

Uniqueness

AgentBay SDK — Uniqueness

Differs From Seeds

Positioning

Observable Failure Modes

What Makes It Interesting

Workflow

AgentBay SDK — Workflow

Typical Workflow

Browser Automation Workflow (Playwright via CDP)

Code Execution Workflow

Approval Gates

Session Concurrency

Memory Context

AgentBay SDK — Memory & Context

State Storage

Memory Type

Cross-Session Handoff

Context Window Management

Observability State

MCP Integration

Notes

Orchestration

AgentBay SDK — Orchestration

Multi-Agent Support

Orchestration Pattern

Isolation Mechanism

Isolation Types by Surface

Subagent Definition Format

Multi-Model Usage

Execution Mode

Startup Time

Consensus Mechanism

Crash Recovery

Streaming Output

Ui Cli Surface

AgentBay SDK — UI / CLI Surface

CLI Binary

Local Web UI

Session Observability Tools

API Access

IDE Integration

Cross-Tool Portability

Developer Console

Related frameworks

SDK Hooks (`hooks/` dir)