Skip to content
/

SWE-ReX

swe-rex · SWE-agent/SWE-ReX · ★ 508 · last commit 2026-05-25

Primitive shape
No installable primitives
00

Summary

SWE-ReX — Summary

SWE-ReX (SWE-agent Remote Execution Framework) is a lightweight Python library that provides a unified interface for interacting with sandboxed shell environments — abstracting away whether commands execute locally, in Docker containers, on AWS Fargate, via Modal, or on remote machines, so that agent code remains identical regardless of execution backend. Its primitive is a persistent shell session (not just command execution): the agent creates bash sessions, sends commands into them, and receives output + exit codes — supporting interactive CLI tools like ipython, gdb, and vim within the same session. SWE-ReX was extracted from SWE-agent to disentangle agent logic from infrastructure concerns, and its design was shaped by the need to run 30+ SWE-bench instances in parallel during benchmark evaluation. The runtime package ships 3 backends (local, remote, dummy) plus optional cloud backends (Modal, Fargate, Daytona) as extras.

Differs from seeds: No seed in the catalog provides a shell session abstraction. All 11 seeds (superpowers, spec-kit, claude-flow, openspec, BMAD-METHOD, taskmaster-ai, agent-os, kiro, ccmemory, claude-conductor, spec-driver) operate at the LLM instruction layer. SWE-ReX operates at the execution layer beneath the agent — it is the component that actually runs the bash commands that a coding agent issues, while remaining agnostic to which LLM or agent framework is above it. The design goal ("disentangle agent logic from infrastructure concerns") is a direct inversion of what seeds do (inject agent logic into the infrastructure).

01

Overview

SWE-ReX — Overview

Origin

SWE-ReX was extracted from SWE-agent by the same team (Princeton/Kilian Lieret, Carlos Jimenez, John Yang) when they realized that the infrastructure complexity of running agents on different backends was polluting the core agent logic. The README states:

"SWE-ReX came out of our experiences with SWE-agent and SWE-agent enigma. Using SWE-ReX, we support fast, massively parallel agent runs (which made evaluating on large benchmarks a breeze). Support a broad range of platforms, including non-Linux machines without Docker. Disentangle agent logic from infrastructure concerns, making SWE-agent more stable and easier to maintain."

Philosophy

The core insight is that a coding agent should issue commands to a shell session abstraction, not to a specific backend. Whether that session runs locally, in a Docker container, or on a cloud VM should be a configuration detail, not an architectural concern.

From the README:

"Whether commands are executed locally or remotely in Docker containers, AWS remote machines, Modal, or something else, your agent code remains the same. Running 100 agents in parallel? No problem either!"

Design choices

  1. Persistent shell sessions, not command-per-process: the agent creates a session and sends multiple commands; session state (environment variables, working directory, process state) persists across commands.

  2. Interactive tool support: supports ipython, gdb, and other interactive programs that don't exit after each command — the runtime detects command completion by watching for the shell prompt (PS1).

  3. Multiple parallel sessions: an agent can manage multiple bash sessions simultaneously (a shell, a Python REPL, and a debugger all at once).

  4. Fast parallel benchmark runs: the design was shaped by the need to run 30+ SWE-bench instances simultaneously.

Relationship to SWE-agent ecosystem

SWE-ReX is the execution layer for:

  • SWE-agent: the main autonomous software engineering agent
  • Mini-SWE-Agent: a lightweight version
  • Potentially other agents in the SWE-bench ecosystem
02

Architecture

SWE-ReX — Architecture

Sandbox primitive

Persistent bash session (not a container or VM directly) — the runtime manages a bash process (local) or a remote bash process (in a Docker container, cloud VM, etc.) and communicates with it via pexpect-based prompt detection.

The sandbox primitive is:

  1. A bash process that persists across multiple commands
  2. PS1-based prompt detection to know when a command has finished
  3. Exit code extraction after each command

The execution backend (local vs. Docker vs. cloud) is a separate concern from the session abstraction.

Execution backends

Backend Install Isolation
local Built-in None (runs directly on host)
remote Built-in Any (connects to a remote HTTP server)
dummy Built-in None (for testing)
modal pip install 'swe-rex[modal]' Modal cloud containers
fargate pip install 'swe-rex[fargate]' AWS ECS Fargate
daytona pip install 'swe-rex[daytona]' (WIP) Daytona dev environments

Distribution

pip install swe-rex
pip install 'swe-rex[modal]'     # Modal support
pip install 'swe-rex[fargate]'   # Fargate support
pip install 'swe-rex[daytona]'   # Daytona (WIP)

Directory tree

SWE-ReX/
├── src/swerex/
│   ├── runtime/
│   │   ├── abstract.py     # AbstractRuntime + all request/response models
│   │   ├── config.py       # LocalRuntimeConfig, RemoteRuntimeConfig, DummyRuntimeConfig
│   │   ├── local.py        # LocalRuntime (pexpect-based bash session management)
│   │   ├── remote.py       # RemoteRuntime (HTTP client to remote swerex server)
│   │   └── dummy.py        # DummyRuntime (no-op, for testing)
│   ├── deployment/         # (may contain Modal/Fargate/Daytona deployment logic)
│   ├── server.py           # FastAPI server (the remote endpoint)
│   ├── exceptions.py       # BashIncorrectSyntaxError, CommandTimeoutError, etc.
│   └── utils/
├── tests/
├── docs/
├── pyproject.toml
└── README.md

Core API (AbstractRuntime)

Method Description
create_session(CreateBashSessionRequest) Start a bash session
run(BashAction)BashObservation Run a command in a session
close_session(CloseSessionRequest) Close a bash session
run_in_session(Command)CommandResponse Run a one-off command
read_file(ReadFileRequest)ReadFileResponse Read a file
write_file(WriteFileRequest) Write a file
upload(UploadRequest) Upload a file to sandbox
is_alive()IsAliveResponse Health check

Remote mode architecture

Agent code → RemoteRuntime → HTTP → FastAPI server (swerex server) → LocalRuntime → bash process

The swerex server can run anywhere (Docker, cloud VM) and exposes the same API over HTTP.

Required runtime

  • Python 3.10+
  • pexpect (for local bash session management)
  • bashlex (for bash command parsing)
  • fastapi + uvicorn (for remote server mode)
  • Backend-specific: modal, boto3+ECS, etc.

Host-OS posture

  • Local runtime: direct host execution (no isolation)
  • Remote runtime: fully isolated in whatever environment the remote server runs in
03

Components

SWE-ReX — Components

Python package (pip install swe-rex)

Core runtime classes

Class Module Purpose
AbstractRuntime runtime/abstract.py Base class defining the shell session interface
LocalRuntime runtime/local.py Runs bash sessions directly via pexpect; manages PS1 detection, interactive tools, exit code extraction
RemoteRuntime runtime/remote.py HTTP client connecting to a remote swerex server; same API as LocalRuntime
DummyRuntime runtime/dummy.py No-op runtime for testing

Request/response models (Pydantic)

Model Purpose
CreateBashSessionRequest Session creation parameters: startup_source, session name, startup_timeout
BashAction Command to run: command, session, timeout, is_interactive_command, is_interactive_quit, check, expect
BashObservation Command result: output, exit_code
BashInterruptAction Send Ctrl-C to a running command
Command One-off command (separate from session-based commands)
CommandResponse One-off command result
ReadFileRequest / ReadFileResponse File read
WriteFileRequest / WriteFileResponse File write
UploadRequest / UploadResponse File upload
IsAliveResponse Health check result

Configuration classes

Class Description
LocalRuntimeConfig type: "local" — no additional parameters
RemoteRuntimeConfig auth_token, host, port, timeout
DummyRuntimeConfig type: "dummy" — no additional parameters
RuntimeConfig Union type for all configs (used for deserialization/CLI)

FastAPI server (swerex server)

The package ships a FastAPI server that accepts the same requests as the LocalRuntime but exposes them over HTTP. This is the server that RemoteRuntime connects to. Deployable in any container or cloud VM.

BashAction fields (detailed)

class BashAction(BaseModel):
    command: str
    session: str = "default"
    timeout: float | None = None
    is_interactive_command: bool = False  # For non-exiting interactive programs (gdb, etc.)
    is_interactive_quit: bool = False     # Disables exit code checking for interactive quit commands
    check: Literal["silent", "raise", "ignore"] = "raise"
    error_msg: str = ""
    expect: list[str] = []               # Additional prompts to expect besides PS1

Exception types

  • BashIncorrectSyntaxError
  • CommandTimeoutError
  • NoExitCodeError
  • NonZeroExitCodeError
  • SessionDoesNotExistError
  • SessionExistsError
  • SessionNotInitializedError

Optional backends (separate extras)

  • Modal: pip install 'swe-rex[modal]'
  • Fargate: pip install 'swe-rex[fargate]'
  • Daytona: pip install 'swe-rex[daytona]' (WIP)
05

Prompts

SWE-ReX — Prompts

SWE-ReX is a Python library with no LLM prompt files. It has no CLAUDE.md, no skill files, and no agent behavioral instructions. The "prompts" in SWE-ReX are bash commands sent to shell sessions — the library is the execution layer, not the instruction layer.

Excerpt 1: AbstractRuntime interface definition (verbatim)

class BashAction(BaseModel):
    command: str
    """The command to run."""

    session: str = "default"
    """The session to run the command in."""

    timeout: float | None = None
    """The timeout for the command. None means no timeout."""

    is_interactive_command: bool = False
    """For a non-exiting command to an interactive program
    (e.g., gdb), set this to True."""

    is_interactive_quit: bool = False
    """This will disable checking for exit codes, since the command won't terminate.
    If the command is something like "quit" and should terminate the
    interactive program, set this to False.
    """

    check: Literal["silent", "raise", "ignore"] = "raise"
    """Whether to check for the exit code.
    If "silent", we will extract the exit code, but not raise any errors.
    If "raise", we will raise a NonZeroExitCodeError if the command has a non-zero exit code.
    If "ignore", we will not attempt to extract the exit code.
    """

    expect: list[str] = []
    """Outputs to expect in addition to the PS1"""

Prompting technique: Not a prompt — this is a typed API definition. The docstrings are the closest thing to "instructions" in SWE-ReX, and they document machine behavior, not LLM behavior. The check field's three modes (silent, raise, ignore) are the most nuanced design choice: ignore allows the agent to run fire-and-forget commands without waiting for completion.

Excerpt 2: README — design motivation (verbatim)

SWE-ReX allows your agent to

* ✅ **Interact with running shell sessions**. SWE-ReX will recognize when commands are finished,
  extract the output and exit code and return them to your agent.
* ✅ Let your agent use **interactive command line tools** like `ipython`, `gdb` or more in the shell.
* ✅ Interact with **multiple such shell sessions in parallel**, similar to how humans can have a
  shell, ipython, gdb, etc. all running at the same time.

Prompting technique: This is requirements documentation, not a prompt. But it reveals the design rationale: interactive CLI tools (ipython, gdb) are first-class citizens, not afterthoughts. This shapes how the runtime was built — PS1-based completion detection works for both scripted and interactive sessions.

Note

SWE-ReX deliberately contains no LLM-facing instructions. Its value is in what it does (run shell commands) not in what it tells an LLM to do. The "prompts" for an agent using SWE-ReX are entirely the responsibility of the agent framework above (SWE-agent, Mini-SWE-agent, etc.).

09

Uniqueness

SWE-ReX — Uniqueness & Positioning

Differs from seeds

SWE-ReX has no counterpart among the 11 seeds and is architecturally distinct from every other framework in this batch. All seeds operate at the LLM instruction layer (skills, hooks, prompts, commands); every other batch entry (AgentTier, CUA, sandboxed.sh, OpenSandbox, CubeSandbox, Capsule) provides a sandbox environment. SWE-ReX provides neither instructions nor an environment — it provides the protocol for talking to a bash session inside an environment. It is the thinnest possible layer between an agent's command intent and actual execution: a typed request/response protocol over pexpect (local) or HTTP (remote). The "disentangle agent logic from infrastructure" philosophy is the inverse of what seeds do — seeds inject logic into infrastructure; SWE-ReX strips logic out of infrastructure.

Positioning

SWE-ReX targets AI agent developers who need to run shell-based coding agents on multiple backends without rewriting their execution logic. Its primary user is the SWE-agent team and derivative projects. It has found secondary adoption in the broader AI agent benchmarking community (SWE-bench evaluation tools).

The project sits at the boundary between "developer library" and "infrastructure component" — it's too thin to be a platform, too opinionated to be just a subprocess wrapper.

Key architectural bets

  1. PS1-based session completion detection — reusing the bash prompt as the "done" signal is a clever hack that works for both scripted and interactive programs without requiring changes to the executed programs
  2. Interactive tool support as a first-class feature — most sandbox/execution layers treat each command as a subprocess; SWE-ReX treats the bash session as the unit, enabling ipython and gdb sessions that persist across multiple agent actions
  3. Same API, any backend — the AbstractRuntime design means switching from local to Docker to Fargate requires only a config change

Observable failure modes

  • PS1 detection fragility — programs that print PS1-like strings to stdout will confuse the session completion detection; the README notes this as a known limitation
  • Local runtime = no isolation — the local backend runs commands directly on the host; a misconfigured agent could damage the host filesystem
  • pexpect latency — PS1 detection introduces a small latency overhead vs. direct subprocess execution; acceptable for SWE-bench timescales but notable for high-frequency command loops
  • Remote server not production-hardened — the FastAPI server in the package is not designed for multi-tenant production use (no rate limiting, basic auth token only)

Cross-references

  • Extracted from SWE-agent (github.com/SWE-agent/SWE-agent)
  • Used by Mini-SWE-Agent, SWE-agent enigma
  • Related to SWE-bench, SWE-smith, sb-cli in the SWE-agent ecosystem
04

Workflow

SWE-ReX — Workflow

Basic workflow

from swerex.runtime.config import LocalRuntimeConfig

# Create runtime
config = LocalRuntimeConfig()
runtime = config.get_runtime()

# Create a bash session
response = await runtime.create_session(CreateBashSessionRequest(session="main"))

# Run commands
obs = await runtime.run(BashAction(command="cd /tmp && echo hello", session="main"))
print(obs.output)  # "hello"
print(obs.exit_code)  # 0

# Run interactive tool (ipython)
await runtime.run(BashAction(command="ipython", session="repl", is_interactive_command=True))
await runtime.run(BashAction(command="x = 1 + 1", session="repl", is_interactive_command=True))
obs = await runtime.run(BashAction(command="print(x)", session="repl", is_interactive_command=True))

# Close session
await runtime.close_session(CloseSessionRequest(session="main"))

Remote workflow

from swerex.runtime.config import RemoteRuntimeConfig

config = RemoteRuntimeConfig(
    auth_token="my-token",
    host="http://my-docker-container",
    port=8000
)
runtime = config.get_runtime()
# Same API from here — all commands execute in the remote container

Phase-to-artifact map

Phase Artifact
config.get_runtime() Runtime instance (local or remote)
create_session() Bash process started; PS1 configured
run(BashAction) BashObservation (output + exit code)
run(BashInterruptAction) Ctrl-C sent to running process
close_session() Bash process terminated

Parallel benchmark workflow (SWE-agent use case)

# Create 30 runtimes pointing to 30 different Docker containers
runtimes = [
    RemoteRuntimeConfig(host=f"http://container-{i}", port=8000, auth_token="...").get_runtime()
    for i in range(30)
]

# Fan out tasks in parallel
results = await asyncio.gather(*[
    run_agent(runtime, task)
    for runtime, task in zip(runtimes, swe_bench_tasks)
])

Approval gates

None — SWE-ReX is a library, not a workflow orchestrator. No approval gates.

06

Memory Context

SWE-ReX — Memory & Context

Shell session state (primary memory)

SWE-ReX's primary "memory" is the persistent bash session state:

  • Environment variables set in a session persist across commands
  • Working directory changes (cd) persist
  • Shell functions defined in a session are available to subsequent commands
  • Process state for interactive tools (ipython variables, gdb breakpoints) persists within the session

This is OS-level process memory, not LLM context memory.

File system

Files written via write_file() or bash commands persist in the sandbox's filesystem for the duration of the runtime's lifetime (local: until process exits; remote: until container is destroyed).

Cross-session memory

None built-in — SWE-ReX manages shell sessions within a single runtime instance. Multiple sessions within one runtime share the same filesystem (in local mode), but session state (environment variables, shell history, process state) is per-session.

LLM context management

None — SWE-ReX is deliberately not an LLM context manager. The agent framework above (SWE-agent, etc.) is responsible for managing the LLM's context window, including what observations to include, when to compress, and how to summarize past actions.

Session startup configuration

CreateBashSessionRequest(
    startup_source=["/etc/environment", "/home/user/.bashrc"],  # Files sourced before commands
    session="main",
    startup_timeout=1.0
)

The startup_source files are sourced at session creation, which is the closest thing to "initial context injection" in SWE-ReX — but for bash, not LLMs.

Memory type classification

  • Primary: none (ephemeral shell session state)
  • File-based: yes (files written during session persist in sandbox)
  • No persistent cross-session memory, no vector DB, no SQLite
07

Orchestration

SWE-ReX — Orchestration

Multi-agent support

Yes — the library is designed for massively parallel agent execution. Multiple RemoteRuntime instances can be created pointing to different containers/VMs, and tasks fanned out via asyncio.gather().

From the README:

"Running 100 agents in parallel? No problem either!"

The GIF in the README shows 30 SWE-bench instances running in parallel.

Orchestration pattern

Parallel fan-out at the caller level. SWE-ReX itself is not an orchestrator — it provides the primitives that enable parallel execution. The orchestration logic lives in the agent framework above (SWE-agent).

Isolation mechanism

Depends on the backend:

  • local: none (all sessions share the same host OS)
  • remote: whatever the remote server runs in (Docker container, AWS Fargate, Modal serverless, etc.)

SWE-ReX is isolation-agnostic — it doesn't prescribe a specific isolation mechanism.

Execution mode

One-shot per session (create session → run commands → close session). Sessions are ephemeral. The async design enables parallel one-shot sessions.

Multi-model routing

Not applicable — SWE-ReX is not model-aware. The agent above selects models.

Cross-tool portability

High — SWE-ReX is a pure Python library with no dependency on any specific AI tool. It works with any agent framework that can call Python functions.

Crash recovery

Partial — if a command times out, CommandTimeoutError is raised and the session can be closed and recreated. The BashInterruptAction allows sending Ctrl-C to a running command. There is no automatic session restart on failure.

Streaming output

No — command execution is synchronous: run(BashAction) blocks until the command completes (or times out) and returns the full output.

Consensus mechanism

None.

08

Ui Cli Surface

SWE-ReX — UI & CLI Surface

Dedicated CLI binary

No — SWE-ReX is a Python library. There is no user-facing CLI binary. The __main__.py file suggests python -m swerex may launch the server, but this is not a user-facing CLI tool.

Local web dashboard

None — SWE-ReX is a library, not a service. No web UI.

Server binary

The package ships a FastAPI server (swerex/server.py) that can be deployed in a container to serve as the remote execution endpoint. This is an infrastructure component, not a user-facing tool.

IDE integration

None — SWE-ReX has no IDE integration. It is a Python library imported by agent frameworks.

Documentation

Documentation site at swe-rex.com (mkdocs-material-based).

Observability

None built-in. The library raises typed exceptions (BashIncorrectSyntaxError, CommandTimeoutError, etc.) that callers can handle. No built-in logging framework, metrics, or audit trail.

Primary interface

The primary interface is the Python API:

from swerex.runtime.config import LocalRuntimeConfig, RemoteRuntimeConfig, DummyRuntimeConfig
from swerex.runtime.abstract import BashAction, CreateBashSessionRequest, ...

Cross-tool portability surface

SWE-ReX's "tool surface" is exactly its Python API — it has no other interface. This makes it maximally portable (any Python agent can use it) but requires Python on both sides.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.