SWE-ReX

swe-rex · SWE-agent/SWE-ReX · ★ 508 · last commit 2026-05-25

Primitive shape

No installable primitives

Summary

SWE-ReX — Summary

SWE-ReX (SWE-agent Remote Execution Framework) is a lightweight Python library that provides a unified interface for interacting with sandboxed shell environments — abstracting away whether commands execute locally, in Docker containers, on AWS Fargate, via Modal, or on remote machines, so that agent code remains identical regardless of execution backend. Its primitive is a persistent shell session (not just command execution): the agent creates bash sessions, sends commands into them, and receives output + exit codes — supporting interactive CLI tools like ipython, gdb, and vim within the same session. SWE-ReX was extracted from SWE-agent to disentangle agent logic from infrastructure concerns, and its design was shaped by the need to run 30+ SWE-bench instances in parallel during benchmark evaluation. The runtime package ships 3 backends (local, remote, dummy) plus optional cloud backends (Modal, Fargate, Daytona) as extras.

Differs from seeds: No seed in the catalog provides a shell session abstraction. All 11 seeds (superpowers, spec-kit, claude-flow, openspec, BMAD-METHOD, taskmaster-ai, agent-os, kiro, ccmemory, claude-conductor, spec-driver) operate at the LLM instruction layer. SWE-ReX operates at the execution layer beneath the agent — it is the component that actually runs the bash commands that a coding agent issues, while remaining agnostic to which LLM or agent framework is above it. The design goal ("disentangle agent logic from infrastructure concerns") is a direct inversion of what seeds do (inject agent logic into the infrastructure).

Overview

SWE-ReX — Overview

Origin

SWE-ReX was extracted from SWE-agent by the same team (Princeton/Kilian Lieret, Carlos Jimenez, John Yang) when they realized that the infrastructure complexity of running agents on different backends was polluting the core agent logic. The README states:

"SWE-ReX came out of our experiences with SWE-agent and SWE-agent enigma. Using SWE-ReX, we support fast, massively parallel agent runs (which made evaluating on large benchmarks a breeze). Support a broad range of platforms, including non-Linux machines without Docker. Disentangle agent logic from infrastructure concerns, making SWE-agent more stable and easier to maintain."

Philosophy

The core insight is that a coding agent should issue commands to a shell session abstraction, not to a specific backend. Whether that session runs locally, in a Docker container, or on a cloud VM should be a configuration detail, not an architectural concern.

From the README:

"Whether commands are executed locally or remotely in Docker containers, AWS remote machines, Modal, or something else, your agent code remains the same. Running 100 agents in parallel? No problem either!"

Design choices

Persistent shell sessions, not command-per-process: the agent creates a session and sends multiple commands; session state (environment variables, working directory, process state) persists across commands.
Interactive tool support: supports ipython, gdb, and other interactive programs that don't exit after each command — the runtime detects command completion by watching for the shell prompt (PS1).
Multiple parallel sessions: an agent can manage multiple bash sessions simultaneously (a shell, a Python REPL, and a debugger all at once).
Fast parallel benchmark runs: the design was shaped by the need to run 30+ SWE-bench instances simultaneously.

Relationship to SWE-agent ecosystem

SWE-ReX is the execution layer for:

SWE-agent: the main autonomous software engineering agent
Mini-SWE-Agent: a lightweight version
Potentially other agents in the SWE-bench ecosystem

Architecture

SWE-ReX — Architecture

Sandbox primitive

Persistent bash session (not a container or VM directly) — the runtime manages a bash process (local) or a remote bash process (in a Docker container, cloud VM, etc.) and communicates with it via pexpect-based prompt detection.

The sandbox primitive is:

A bash process that persists across multiple commands
PS1-based prompt detection to know when a command has finished
Exit code extraction after each command

The execution backend (local vs. Docker vs. cloud) is a separate concern from the session abstraction.

Execution backends

Backend	Install	Isolation
`local`	Built-in	None (runs directly on host)
`remote`	Built-in	Any (connects to a remote HTTP server)
`dummy`	Built-in	None (for testing)
`modal`	`pip install 'swe-rex[modal]'`	Modal cloud containers
`fargate`	`pip install 'swe-rex[fargate]'`	AWS ECS Fargate
`daytona`	`pip install 'swe-rex[daytona]'` (WIP)	Daytona dev environments

Distribution

pip install swe-rex
pip install 'swe-rex[modal]'     # Modal support
pip install 'swe-rex[fargate]'   # Fargate support
pip install 'swe-rex[daytona]'   # Daytona (WIP)

Directory tree

SWE-ReX/
├── src/swerex/
│   ├── runtime/
│   │   ├── abstract.py     # AbstractRuntime + all request/response models
│   │   ├── config.py       # LocalRuntimeConfig, RemoteRuntimeConfig, DummyRuntimeConfig
│   │   ├── local.py        # LocalRuntime (pexpect-based bash session management)
│   │   ├── remote.py       # RemoteRuntime (HTTP client to remote swerex server)
│   │   └── dummy.py        # DummyRuntime (no-op, for testing)
│   ├── deployment/         # (may contain Modal/Fargate/Daytona deployment logic)
│   ├── server.py           # FastAPI server (the remote endpoint)
│   ├── exceptions.py       # BashIncorrectSyntaxError, CommandTimeoutError, etc.
│   └── utils/
├── tests/
├── docs/
├── pyproject.toml
└── README.md

Core API (AbstractRuntime)

Method	Description
`create_session(CreateBashSessionRequest)`	Start a bash session
`run(BashAction)` → `BashObservation`	Run a command in a session
`close_session(CloseSessionRequest)`	Close a bash session
`run_in_session(Command)` → `CommandResponse`	Run a one-off command
`read_file(ReadFileRequest)` → `ReadFileResponse`	Read a file
`write_file(WriteFileRequest)`	Write a file
`upload(UploadRequest)`	Upload a file to sandbox
`is_alive()` → `IsAliveResponse`	Health check

Remote mode architecture

Agent code → RemoteRuntime → HTTP → FastAPI server (swerex server) → LocalRuntime → bash process

The swerex server can run anywhere (Docker, cloud VM) and exposes the same API over HTTP.

Required runtime

Python 3.10+
pexpect (for local bash session management)
bashlex (for bash command parsing)
fastapi + uvicorn (for remote server mode)
Backend-specific: modal, boto3+ECS, etc.

Host-OS posture

Local runtime: direct host execution (no isolation)
Remote runtime: fully isolated in whatever environment the remote server runs in

Components

SWE-ReX — Components

Python package (`pip install swe-rex`)

Core runtime classes

Class	Module	Purpose
`AbstractRuntime`	`runtime/abstract.py`	Base class defining the shell session interface
`LocalRuntime`	`runtime/local.py`	Runs bash sessions directly via `pexpect`; manages PS1 detection, interactive tools, exit code extraction
`RemoteRuntime`	`runtime/remote.py`	HTTP client connecting to a remote swerex server; same API as LocalRuntime
`DummyRuntime`	`runtime/dummy.py`	No-op runtime for testing

Request/response models (Pydantic)

Model	Purpose
`CreateBashSessionRequest`	Session creation parameters: startup_source, session name, startup_timeout
`BashAction`	Command to run: `command`, `session`, `timeout`, `is_interactive_command`, `is_interactive_quit`, `check`, `expect`
`BashObservation`	Command result: output, exit_code
`BashInterruptAction`	Send Ctrl-C to a running command
`Command`	One-off command (separate from session-based commands)
`CommandResponse`	One-off command result
`ReadFileRequest` / `ReadFileResponse`	File read
`WriteFileRequest` / `WriteFileResponse`	File write
`UploadRequest` / `UploadResponse`	File upload
`IsAliveResponse`	Health check result

Configuration classes

Class	Description
`LocalRuntimeConfig`	`type: "local"` — no additional parameters
`RemoteRuntimeConfig`	`auth_token`, `host`, `port`, `timeout`
`DummyRuntimeConfig`	`type: "dummy"` — no additional parameters
`RuntimeConfig`	Union type for all configs (used for deserialization/CLI)

FastAPI server (`swerex server`)

The package ships a FastAPI server that accepts the same requests as the LocalRuntime but exposes them over HTTP. This is the server that RemoteRuntime connects to. Deployable in any container or cloud VM.

BashAction fields (detailed)

class BashAction(BaseModel):
    command: str
    session: str = "default"
    timeout: float | None = None
    is_interactive_command: bool = False  # For non-exiting interactive programs (gdb, etc.)
    is_interactive_quit: bool = False     # Disables exit code checking for interactive quit commands
    check: Literal["silent", "raise", "ignore"] = "raise"
    error_msg: str = ""
    expect: list[str] = []               # Additional prompts to expect besides PS1

Exception types

BashIncorrectSyntaxError
CommandTimeoutError
NoExitCodeError
NonZeroExitCodeError
SessionDoesNotExistError
SessionExistsError
SessionNotInitializedError

Optional backends (separate extras)

Modal: pip install 'swe-rex[modal]'
Fargate: pip install 'swe-rex[fargate]'
Daytona: pip install 'swe-rex[daytona]' (WIP)

Prompts

SWE-ReX — Prompts

SWE-ReX is a Python library with no LLM prompt files. It has no CLAUDE.md, no skill files, and no agent behavioral instructions. The "prompts" in SWE-ReX are bash commands sent to shell sessions — the library is the execution layer, not the instruction layer.

Excerpt 1: AbstractRuntime interface definition (verbatim)

class BashAction(BaseModel):
    command: str
    """The command to run."""

    session: str = "default"
    """The session to run the command in."""

    timeout: float | None = None
    """The timeout for the command. None means no timeout."""

    is_interactive_command: bool = False
    """For a non-exiting command to an interactive program
    (e.g., gdb), set this to True."""

    is_interactive_quit: bool = False
    """This will disable checking for exit codes, since the command won't terminate.
    If the command is something like "quit" and should terminate the
    interactive program, set this to False.
    """

    check: Literal["silent", "raise", "ignore"] = "raise"
    """Whether to check for the exit code.
    If "silent", we will extract the exit code, but not raise any errors.
    If "raise", we will raise a NonZeroExitCodeError if the command has a non-zero exit code.
    If "ignore", we will not attempt to extract the exit code.
    """

    expect: list[str] = []
    """Outputs to expect in addition to the PS1"""

Prompting technique: Not a prompt — this is a typed API definition. The docstrings are the closest thing to "instructions" in SWE-ReX, and they document machine behavior, not LLM behavior. The check field's three modes (silent, raise, ignore) are the most nuanced design choice: ignore allows the agent to run fire-and-forget commands without waiting for completion.

Excerpt 2: README — design motivation (verbatim)

SWE-ReX allows your agent to

* ✅ **Interact with running shell sessions**. SWE-ReX will recognize when commands are finished,
  extract the output and exit code and return them to your agent.
* ✅ Let your agent use **interactive command line tools** like `ipython`, `gdb` or more in the shell.
* ✅ Interact with **multiple such shell sessions in parallel**, similar to how humans can have a
  shell, ipython, gdb, etc. all running at the same time.

Prompting technique: This is requirements documentation, not a prompt. But it reveals the design rationale: interactive CLI tools (ipython, gdb) are first-class citizens, not afterthoughts. This shapes how the runtime was built — PS1-based completion detection works for both scripted and interactive sessions.

Note

SWE-ReX deliberately contains no LLM-facing instructions. Its value is in what it does (run shell commands) not in what it tells an LLM to do. The "prompts" for an agent using SWE-ReX are entirely the responsibility of the agent framework above (SWE-agent, Mini-SWE-agent, etc.).

Uniqueness

SWE-ReX — Uniqueness & Positioning

Differs from seeds

SWE-ReX has no counterpart among the 11 seeds and is architecturally distinct from every other framework in this batch. All seeds operate at the LLM instruction layer (skills, hooks, prompts, commands); every other batch entry (AgentTier, CUA, sandboxed.sh, OpenSandbox, CubeSandbox, Capsule) provides a sandbox environment. SWE-ReX provides neither instructions nor an environment — it provides the protocol for talking to a bash session inside an environment. It is the thinnest possible layer between an agent's command intent and actual execution: a typed request/response protocol over pexpect (local) or HTTP (remote). The "disentangle agent logic from infrastructure" philosophy is the inverse of what seeds do — seeds inject logic into infrastructure; SWE-ReX strips logic out of infrastructure.

Positioning

SWE-ReX targets AI agent developers who need to run shell-based coding agents on multiple backends without rewriting their execution logic. Its primary user is the SWE-agent team and derivative projects. It has found secondary adoption in the broader AI agent benchmarking community (SWE-bench evaluation tools).

The project sits at the boundary between "developer library" and "infrastructure component" — it's too thin to be a platform, too opinionated to be just a subprocess wrapper.

Key architectural bets

PS1-based session completion detection — reusing the bash prompt as the "done" signal is a clever hack that works for both scripted and interactive programs without requiring changes to the executed programs
Interactive tool support as a first-class feature — most sandbox/execution layers treat each command as a subprocess; SWE-ReX treats the bash session as the unit, enabling ipython and gdb sessions that persist across multiple agent actions
Same API, any backend — the AbstractRuntime design means switching from local to Docker to Fargate requires only a config change

Observable failure modes

PS1 detection fragility — programs that print PS1-like strings to stdout will confuse the session completion detection; the README notes this as a known limitation
Local runtime = no isolation — the local backend runs commands directly on the host; a misconfigured agent could damage the host filesystem
pexpect latency — PS1 detection introduces a small latency overhead vs. direct subprocess execution; acceptable for SWE-bench timescales but notable for high-frequency command loops
Remote server not production-hardened — the FastAPI server in the package is not designed for multi-tenant production use (no rate limiting, basic auth token only)

Cross-references

Extracted from SWE-agent (github.com/SWE-agent/SWE-agent)
Used by Mini-SWE-Agent, SWE-agent enigma
Related to SWE-bench, SWE-smith, sb-cli in the SWE-agent ecosystem

Workflow

SWE-ReX — Workflow

Basic workflow

from swerex.runtime.config import LocalRuntimeConfig

# Create runtime
config = LocalRuntimeConfig()
runtime = config.get_runtime()

# Create a bash session
response = await runtime.create_session(CreateBashSessionRequest(session="main"))

# Run commands
obs = await runtime.run(BashAction(command="cd /tmp && echo hello", session="main"))
print(obs.output)  # "hello"
print(obs.exit_code)  # 0

# Run interactive tool (ipython)
await runtime.run(BashAction(command="ipython", session="repl", is_interactive_command=True))
await runtime.run(BashAction(command="x = 1 + 1", session="repl", is_interactive_command=True))
obs = await runtime.run(BashAction(command="print(x)", session="repl", is_interactive_command=True))

# Close session
await runtime.close_session(CloseSessionRequest(session="main"))

Remote workflow

from swerex.runtime.config import RemoteRuntimeConfig

config = RemoteRuntimeConfig(
    auth_token="my-token",
    host="http://my-docker-container",
    port=8000
)
runtime = config.get_runtime()
# Same API from here — all commands execute in the remote container

Phase-to-artifact map

Phase	Artifact
`config.get_runtime()`	Runtime instance (local or remote)
`create_session()`	Bash process started; PS1 configured
`run(BashAction)`	`BashObservation` (output + exit code)
`run(BashInterruptAction)`	Ctrl-C sent to running process
`close_session()`	Bash process terminated

Parallel benchmark workflow (SWE-agent use case)

# Create 30 runtimes pointing to 30 different Docker containers
runtimes = [
    RemoteRuntimeConfig(host=f"http://container-{i}", port=8000, auth_token="...").get_runtime()
    for i in range(30)
]

# Fan out tasks in parallel
results = await asyncio.gather(*[
    run_agent(runtime, task)
    for runtime, task in zip(runtimes, swe_bench_tasks)
])

Approval gates

None — SWE-ReX is a library, not a workflow orchestrator. No approval gates.

Memory Context

SWE-ReX — Memory & Context

Shell session state (primary memory)

SWE-ReX's primary "memory" is the persistent bash session state:

Environment variables set in a session persist across commands
Working directory changes (cd) persist
Shell functions defined in a session are available to subsequent commands
Process state for interactive tools (ipython variables, gdb breakpoints) persists within the session

This is OS-level process memory, not LLM context memory.

File system

Files written via write_file() or bash commands persist in the sandbox's filesystem for the duration of the runtime's lifetime (local: until process exits; remote: until container is destroyed).

Cross-session memory

None built-in — SWE-ReX manages shell sessions within a single runtime instance. Multiple sessions within one runtime share the same filesystem (in local mode), but session state (environment variables, shell history, process state) is per-session.

LLM context management

None — SWE-ReX is deliberately not an LLM context manager. The agent framework above (SWE-agent, etc.) is responsible for managing the LLM's context window, including what observations to include, when to compress, and how to summarize past actions.

Session startup configuration

CreateBashSessionRequest(
    startup_source=["/etc/environment", "/home/user/.bashrc"],  # Files sourced before commands
    session="main",
    startup_timeout=1.0
)

The startup_source files are sourced at session creation, which is the closest thing to "initial context injection" in SWE-ReX — but for bash, not LLMs.

Memory type classification

Primary: none (ephemeral shell session state)
File-based: yes (files written during session persist in sandbox)
No persistent cross-session memory, no vector DB, no SQLite

Orchestration

SWE-ReX — Orchestration

Multi-agent support

Yes — the library is designed for massively parallel agent execution. Multiple RemoteRuntime instances can be created pointing to different containers/VMs, and tasks fanned out via asyncio.gather().

From the README:

"Running 100 agents in parallel? No problem either!"

The GIF in the README shows 30 SWE-bench instances running in parallel.

Orchestration pattern

Parallel fan-out at the caller level. SWE-ReX itself is not an orchestrator — it provides the primitives that enable parallel execution. The orchestration logic lives in the agent framework above (SWE-agent).

Isolation mechanism

Depends on the backend:

local: none (all sessions share the same host OS)
remote: whatever the remote server runs in (Docker container, AWS Fargate, Modal serverless, etc.)

SWE-ReX is isolation-agnostic — it doesn't prescribe a specific isolation mechanism.

Execution mode

One-shot per session (create session → run commands → close session). Sessions are ephemeral. The async design enables parallel one-shot sessions.

Multi-model routing

Not applicable — SWE-ReX is not model-aware. The agent above selects models.

Cross-tool portability

High — SWE-ReX is a pure Python library with no dependency on any specific AI tool. It works with any agent framework that can call Python functions.

Crash recovery

Partial — if a command times out, CommandTimeoutError is raised and the session can be closed and recreated. The BashInterruptAction allows sending Ctrl-C to a running command. There is no automatic session restart on failure.

Streaming output

No — command execution is synchronous: run(BashAction) blocks until the command completes (or times out) and returns the full output.

Consensus mechanism

None.

Ui Cli Surface

SWE-ReX — UI & CLI Surface

Dedicated CLI binary

No — SWE-ReX is a Python library. There is no user-facing CLI binary. The __main__.py file suggests python -m swerex may launch the server, but this is not a user-facing CLI tool.

Local web dashboard

None — SWE-ReX is a library, not a service. No web UI.

Server binary

The package ships a FastAPI server (swerex/server.py) that can be deployed in a container to serve as the remote execution endpoint. This is an infrastructure component, not a user-facing tool.

IDE integration

None — SWE-ReX has no IDE integration. It is a Python library imported by agent frameworks.

Documentation

Documentation site at swe-rex.com (mkdocs-material-based).

Observability

None built-in. The library raises typed exceptions (BashIncorrectSyntaxError, CommandTimeoutError, etc.) that callers can handle. No built-in logging framework, metrics, or audit trail.

Primary interface

The primary interface is the Python API:

from swerex.runtime.config import LocalRuntimeConfig, RemoteRuntimeConfig, DummyRuntimeConfig
from swerex.runtime.abstract import BashAction, CreateBashSessionRequest, ...

Cross-tool portability surface

SWE-ReX's "tool surface" is exactly its Python API — it has no other interface. This makes it maximally portable (any Python agent can use it) but requires Python on both sides.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

A8 Cross-runtime harness

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A8 Cross-runtime harness

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

A8 Cross-runtime harness

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

A8 Cross-runtime harness

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

A8 Cross-runtime harness

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

A8 Cross-runtime harness

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.

Distribution

Type: standalone-repo
License: MIT
Install: one-liner

Surfaces

CLI binary: No
CLI subcmds: 0
Local UI: No

Components

Commands: 0
Skills: 0
Subagents: 0
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 0
Templates: 0

Workflow

Phases: 4
Approval gates: 0
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: Yes
Pattern: parallel-fan-out
Isolation: none
Consensus: none
Prompt chaining: No

Multi-model

Multi-model: No
BYOK: Yes
Modal: text

Execution

Mode: one-shot
Crash recovery: No
Compaction: No
Session handoff: No
Streaming: No

Memory

Type: none
Persistence: session
Search: none

Quality

TDD: No
TDD mechanism: none
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: No
Audit format: none
Replay: No

Tools

Primary: SWE-agent
Targets: 3
Portability: high

Signals

Stars: 508
Last commit: 2026-05-25
Contributors: 23
Maintainer: active
Quality score: 0/10

Summary

SWE-ReX — Summary

Overview

SWE-ReX — Overview

Origin

Philosophy

Design choices

Relationship to SWE-agent ecosystem

Architecture

SWE-ReX — Architecture

Sandbox primitive

Execution backends

Distribution

Directory tree

Core API (AbstractRuntime)

Remote mode architecture

Required runtime

Host-OS posture

Components

SWE-ReX — Components

Python package (pip install swe-rex)

Core runtime classes

Request/response models (Pydantic)

Configuration classes

FastAPI server (swerex server)

BashAction fields (detailed)

Exception types

Optional backends (separate extras)

Prompts

SWE-ReX — Prompts

Excerpt 1: AbstractRuntime interface definition (verbatim)

Excerpt 2: README — design motivation (verbatim)

Note

Uniqueness

SWE-ReX — Uniqueness & Positioning

Differs from seeds

Positioning

Key architectural bets

Observable failure modes

Cross-references

Workflow

SWE-ReX — Workflow

Basic workflow

Remote workflow

Phase-to-artifact map

Parallel benchmark workflow (SWE-agent use case)

Approval gates

Memory Context

SWE-ReX — Memory & Context

Shell session state (primary memory)

File system

Cross-session memory

LLM context management

Session startup configuration

Memory type classification

Orchestration

SWE-ReX — Orchestration

Multi-agent support

Orchestration pattern

Isolation mechanism

Execution mode

Multi-model routing

Cross-tool portability

Crash recovery

Streaming output

Consensus mechanism

Ui Cli Surface

SWE-ReX — UI & CLI Surface

Dedicated CLI binary

Local web dashboard

Server binary

IDE integration

Documentation

Observability

Primary interface

Cross-tool portability surface

Related frameworks

Python package (`pip install swe-rex`)

FastAPI server (`swerex server`)