mini-swe-agent

mini-swe-agent · SWE-agent/mini-swe-agent · ★ 4.5k · last commit 2026-05-25

Primitive shape

No installable primitives

Summary

mini-swe-agent — Summary

mini-swe-agent is a deliberately minimal Python coding agent from the Princeton/Stanford SWE-bench team that answers the question: "What if our agent was 100x simpler, and still worked nearly as well?" The core DefaultAgent class is ~130 lines of Python, uses only bash as a tool (no file-specific tools), executes each action with subprocess.run (stateless — no persistent shell session), maintains a completely linear message history, and supports all LLMs via litellm. Despite this radical simplicity, it scores >74% on SWE-bench verified with Gemini 3 Pro. It ships a CLI (mini), multiple sandboxed execution environments (Docker, Singularity, bubblewrap, contree), a trajectory browser UI (Textual TUI), and batch evaluation mode. The intentional minimalism serves two purposes: (1) a clean research baseline that isolates LLM capability from agent scaffold; (2) a hackable daily-use tool.

Compared to seeds: no seed framework approaches this level of intentional minimalism. The closest is agent-os (zero-primitive scaffold), but mini-swe-agent is a fully functional agent with 74% SWE-bench performance in ~100 lines. The architectural insight — single bash tool + stateless subprocess execution + linear history — is unique in the entire corpus.

Overview

mini-swe-agent — Overview

Origin

Created by the Princeton & Stanford team behind SWE-bench and SWE-agent. Repository: SWE-agent/mini-swe-agent. MIT license. ~4526 GitHub stars. Actively maintained (last push: 2026-05-25).

Philosophy

"What if our agent was 100x simpler, and still worked nearly as well?"

Key design decisions (from README):

"Does not have any tools other than bash — it doesn't even need to use the tool-calling interface of the LMs."

"Executes actions with subprocess.run — every action is completely independent (as opposed to keeping a stateful shell session running). This makes it trivial to execute the actions in sandboxes."

"Has a completely linear history — every step of the agent just appends to the messages and that's it. So there's no difference between the trajectory and the messages that you pass on to the LM. Great for debugging & fine-tuning."

The Core Insight

"As LMs have become more capable, a lot of [the tools and special interfaces] is not needed at all to build a useful agent!"

The agent removes complexity from the scaffold and places the cognitive burden entirely on the LLM. This makes it:

Easier to debug (linear history = perfect observability)
Easier to fine-tune (trajectory = training data in the exact format needed)
Easier to sandbox (stateless subprocess = just replace subprocess.run with docker exec)
Easier to reason about (no state machine, no tool registry, no history processors)

Manifesto quote

"Some agents are overfitted research artifacts. Others are UI-heavy frontend monsters. The mini agent wants to be a hackable tool, not a black box."

LOC Count

DefaultAgent class: ~130 lines (src/minisweagent/agents/default.py)
LocalEnvironment: ~100 lines
LitellmModel: ~100 lines
hello_world.py run script: ~50 lines
Total core: ~400 lines of Python

Architecture

mini-swe-agent — Architecture

Distribution & Install

pip install mini-swe-agent
Binary: mini (CLI)
Also available for batch evaluation via Python bindings

Required Runtime

Python (version from pyproject.toml)
litellm (for universal LLM provider support)
Optional sandbox environments:
- Docker / Podman
- Singularity / Apptainer
- bubblewrap
- contree

Source Structure

src/minisweagent/
├── agents/
│   └── default.py          # DefaultAgent class (~130 lines) — THE core
├── environments/
│   └── local.py            # LocalEnvironment class
│   (+ docker, singularity, bubblewrap, contree environments)
├── models/
│   └── litellm_model.py    # LitellmModel class
├── run/
│   └── hello_world.py      # CLI entry point script
├── config/
│   ├── default.yaml        # Default config with system + instance templates
│   ├── mini.yaml           # Interactive CLI config
│   └── benchmarks/         # SWE-bench evaluation configs
└── utils/

Agent Simplicity Architecture

DefaultAgent
├── messages: list[dict]     # Linear history — only data structure
├── model: Model             # Any litellm-supported model
├── env: Environment         # Local or sandboxed
├── config: AgentConfig
│   ├── system_template      # Jinja2 template for system message
│   ├── instance_template    # Jinja2 template for first user message
│   ├── step_limit           # Hard stop
│   └── cost_limit           # Cost-based stop
└── run() → step() → query() + execute_actions()

Tool Surface: Bash Only

The agent has exactly ONE tool: bash. No file-read tool, no search tool, no diff tool — just bash. The model must figure out cat, grep, sed, etc. on its own.

Python SDK

agent = DefaultAgent(
    LitellmModel(model_name=...),
    LocalEnvironment(),
)
agent.run("Write a sudoku game")

Components

mini-swe-agent — Components

Core Classes (minimal by design)

Class	File	LOC	Purpose
`DefaultAgent`	`agents/default.py`	~130	The complete agent loop: query → execute → repeat
`LocalEnvironment`	`environments/local.py`	~100	Runs bash commands via `subprocess.run`
`LitellmModel`	`models/litellm_model.py`	~100	Universal LLM interface via litellm
`hello_world.py`	`run/hello_world.py`	~50	CLI entry point

`DefaultAgent` Methods

Method	Purpose
`run(task)`	Outer loop: initialize messages, call step() until exit
`step()`	Single agent step: query() + execute_actions()
`query()`	Call model, append message, check limits
`execute_actions()`	Extract bash commands from response, run via env
`serialize()`	Serialize full state to JSON for trajectory saving
`save()`	Write trajectory to disk

Execution Environments

Environment	How it works
`LocalEnvironment`	`subprocess.run(command, shell=True)` — stateless, isolated per call
Docker/Podman	`docker exec` replaces `subprocess.run`
Singularity/Apptainer	container exec
bubblewrap	Linux user namespace sandboxing
contree	filesystem tree isolation

Config Files (Jinja2 YAML)

File	Purpose
`config/default.yaml`	system_template + instance_template for default agent behavior
`config/mini.yaml`	Configuration for the `mini` interactive CLI
`config/benchmarks/`	Configurations for SWE-bench evaluation

CLI: `mini`

The mini CLI binary provides interactive use with the default configuration. A swebench mode exists for batch evaluation.

Trajectory Browser

A Textual (Python TUI) browser for viewing agent trajectories — shows messages, tool calls, costs, and outcomes. Mentioned as a key feature alongside the CLI.

Prompts

mini-swe-agent — Prompts

Prompt Format

System and instance templates are Jinja2 YAML strings in config files. The templates use {{variable}} interpolation from get_template_vars().

Verbatim Excerpt 1: System Template (from `config/default.yaml`)

agent:
  system_template: |
    You are a helpful assistant that can interact with a computer.

    Your response must contain exactly ONE bash code block with ONE command (or commands connected with && or ||).
    Include a THOUGHT section before your command where you explain your reasoning process.
    Format your response as shown in <format_example>.

    <format_example>
    Your reasoning and analysis here. Explain why you want to perform the action.

    ```mswea_bash_command
    your_command_here
    ```
    </format_example>

    Failure to follow these rules will cause your response to be rejected.

Prompting technique: Strict format constraint with a concrete example (<format_example>) and an explicit consequence for non-compliance ("will cause your response to be rejected"). The THOUGHT section enforces chain-of-thought before action — similar to ReAct but with a mandatory structural constraint on the output format.

Verbatim Excerpt 2: Instance Template (recommended workflow + format)

  instance_template: |
    Please solve this issue: {{task}}

    ## Recommended Workflow

    1. Analyze the codebase by finding and reading relevant files
    2. Create a script to reproduce the issue
    3. Edit the source code to resolve the issue
    4. Verify your fix works by running your script again
    5. Test edge cases to ensure your fix is robust
    6. Submit your changes and finish your work by issuing the following command: `echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT`.
       Do not combine it with any other command. <important>After this command, you cannot continue working on this task.</important>

    ## Important Rules

    1. Every response must contain exactly one action
    2. The action must be enclosed in triple backticks
    3. Directory or environment variable changes are not persistent. Every action is executed in a new subshell.
       However, you can prefix any action with `MY_ENV_VAR=MY_VALUE cd /path/to/working/dir && ...`

Prompting technique: Step-by-step recommended workflow (not a hard constraint) + critical rules that ARE constraints. The "not persistent" note is a critical correctness hint that prevents a common failure mode. The completion command (echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT) is a deliberate, non-compositional sentinel that can't be accidentally combined with other commands.

Uniqueness

mini-swe-agent — Uniqueness & Positioning

Differs from Seeds

No seed framework approaches this level of intentional minimalism. The closest is agent-os (zero primitives, markdown scaffolding) — but agent-os has no agent execution capability at all. mini-swe-agent is the only framework in the entire corpus that treats radical simplicity as the explicit engineering thesis and demonstrates it achieves near-state-of-the-art benchmark performance (>74% SWE-bench verified). The key deltas from all seeds: (1) single bash tool — no file-read, no diff, no search tools; (2) stateless subprocess execution — no persistent shell session; (3) linear message history — the trajectory IS the message history, no abstraction layer; (4) designed to be a fine-tuning data source — the trajectory format is training-ready; (5) multiple sandbox backends interchangeable via a single code change.

Positioning

The canonical "what is the minimum viable coding agent?" reference implementation. Used by Meta, NVIDIA, Essential AI, IBM, and universities as a baseline for agent research. Not primarily a production tool — it is a research artifact and a hackable starting point.

Key Differentiators

~130 lines for the agent class — intentionally minimal, documented as a design goal
Only bash — no tool registry; forces LLM to use shell creatively
Stateless subprocess — each command in a fresh subshell; makes sandboxing trivial
74% SWE-bench verified — competitive with much more complex agents
Linear trajectory = fine-tuning data — trajectory format is training-ready
Multiple sandbox environments — local, Docker, Singularity, bwrap, contree

Observable Failure Modes

No persistence: cd doesn't persist — agents must prefix each command; this confuses models occasionally
Context window growth: no compaction; long sessions may exceed context limits
Single model limitation: no expert escalation or multi-model routing
No recovery mechanism: if the agent writes bad code and tests fail, there's no rollback
Intentional scope: not designed for interactive coding sessions — best for well-defined issue-fixing tasks

Workflow

mini-swe-agent — Workflow

Agent Loop (from DefaultAgent.run)

run(task)
  → messages = [system_message, user_message]
  → while True:
      step()
        → query()             # model.query(messages) → response
                              # append response to messages
        → execute_actions()   # extract bash blocks → env.execute() × N
                              # append observation messages
      if messages[-1]["role"] == "exit":
          break
  → return last_message["extra"]

Phases & Artifacts

Phase	Action	Artifact
Init	Render system + instance templates	Messages list
Query	Send messages to LLM	Model response with bash code block
Action extraction	Parse `mswea_bash_command` fenced block from response	Command string
Execution	`subprocess.run(command)` (or docker exec)	stdout/stderr
Observation	Format output as observation message	Appended to messages
Repeat	Continue until `COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT` or limit
Save	Write trajectory JSON	`output_path.json`

Required Response Format

The agent's system prompt mandates exactly ONE bash code block per response:

THOUGHT: Your reasoning here.

```mswea_bash_command
your_command_here


## Completion Signal

The agent submits by running:
```bash
echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT

Approval Gates

None — the agent runs autonomously with no human approval prompts.

Limits

step_limit: max number of LLM calls (default: no limit if 0)
cost_limit: stop when cumulative cost exceeds threshold (default: $3.00)
wall_time_limit_seconds: time-based stop (default: no limit if 0)

SWE-bench Evaluation Mode

Batch mode: run the agent against SWE-bench instances, collect trajectories, evaluate patches.

Memory Context

mini-swe-agent — Memory & Context

Memory Architecture: Linear History Only

The agent has no external memory, no database, no vector store. The ONLY state is self.messages: list[dict] — a linear list of messages that grows with each step.

"Has a completely linear history — every step of the agent just appends to the messages and that's it. So there's no difference between the trajectory and the messages that you pass on to the LM."

What Goes in Messages

System message (rendered from system_template)
User message (rendered from instance_template with task)
Model response (with THOUGHT + bash command)
Observation message (bash output)
Repeat from step 3...
Exit message (when task complete or limits exceeded)

Trajectory Saving

The entire message list IS the trajectory. agent.save(output_path) writes a JSON file containing:

All messages
Model stats (cost, API calls)
Agent config
Exit status and submission
mini_version

This makes the trajectory both an audit log and a training dataset — the exact format needed for fine-tuning.

Context Window Management

No explicit compaction. The model sees the full linear history on each call. The cost_limit and step_limit fields prevent unbounded growth by stopping the agent when resources are exhausted.

Cross-Session Handoff

Not supported — each run() starts fresh with an empty messages list. Trajectories can be manually reviewed between runs.

Orchestration

mini-swe-agent — Orchestration

Multi-Agent

No. The DefaultAgent is a single-agent loop. No subagents, no parallelism, no orchestrator.

Orchestration Pattern

None — single sequential loop.

Execution Mode

One-shot (interactive via mini CLI) or batch (SWE-bench evaluation mode).

Isolation Mechanism

Multiple — this is a key design feature:

Environment	Isolation
`LocalEnvironment`	Process-level (subprocess.run, separate subshell per command)
Docker/Podman	Container
Singularity/Apptainer	Container
bubblewrap	Linux user namespace + filesystem namespacing
contree	Filesystem tree isolation

The stateless subprocess execution (subprocess.run) makes switching between these trivial — just change the executor. No persistent shell state to maintain across isolation boundaries.

Why Stateless Subprocess Matters

"This is a big deal, trust me."

Each command runs in a fresh subshell. Benefits:

Trivially sandboxable (just replace subprocess.run)
No shell state corruption across commands
Works in any isolation environment identically
cd and export don't persist — agents learn to prefix commands

Multi-Model

litellm supports all major LLM providers. Model selection is a config parameter. No multi-model routing per se — one model per run.

Prompt Chaining

Linear message chain — each step's output IS the input for the next step (via message history). Simple, explicit, debuggable.

Ui Cli Surface

mini-swe-agent — UI & CLI Surface

Dedicated CLI Binary

Binary name: mini
Not a thin wrapper: full agent runtime in Python
Install: pip install mini-swe-agent

Interactive CLI (`mini`)

The mini CLI provides an interactive mode for daily use. It runs the DefaultAgent with the mini.yaml configuration.

Trajectory Browser

Type: terminal-tui
Stack: Textual (Python TUI framework)
Purpose: Browse and analyze completed agent trajectories
Features: message viewer, tool call inspection, cost display, exit status

Batch Inference Mode

Binary: swebench (or similar)
Purpose: Run the agent against SWE-bench instances in batch for evaluation
Shows a visual progress display during evaluation runs

Python SDK

from minisweagent import DefaultAgent, LitellmModel, LocalEnvironment

agent = DefaultAgent(
    LitellmModel(model_name="gpt-4o"),
    LocalEnvironment(),
)
result = agent.run("Fix the failing test in src/foo.py")

Programmatic Extensibility

The DefaultAgent is explicitly designed for subclassing. Key override points:

query() — add pre/post hooks
execute_actions() — custom action handling
handle_uncaught_exception() — custom error handling
config_class parameter — swap in custom AgentConfig

ProgramBench Integration

New benchmark mentioned in README (as of May 2026): mini-swe-agent runs on ProgramBench via a dedicated evaluation mode.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

A8 Cross-runtime harness

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A8 Cross-runtime harness

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

A8 Cross-runtime harness

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

A8 Cross-runtime harness

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

A8 Cross-runtime harness

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

A8 Cross-runtime harness

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.

Distribution

Type: cli-tool
License: MIT
Install: one-liner
Version: v2 / main (2026-05-25)

Surfaces

CLI binary: mini
CLI subcmds: 2
Local UI: terminal-tui
Tech stack: Textual (Python TUI)

Components

Commands: 0
Skills: 0
Subagents: 0
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 0
Templates: 3

Workflow

Phases: 7
Approval gates: 0
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: No
Pattern: none
Max concurrent: 1
Isolation: process
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: No
BYOK: Yes
Locked to: null (any litellm-supported model)
Modal: text+vision

Execution

Mode: one-shot
Crash recovery: No
Compaction: No
Session handoff: No
Streaming: Yes

Memory

Type: none
Persistence: session
Search: none
State files: 1 file

Quality

TDD: No
TDD mechanism: none
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: Yes
Audit format: jsonl
Replay: Yes

Tools

Primary: mini-swe-agent-cli
Targets: 5
Portability: high

Signals

Stars: 4.5k
Last commit: 2026-05-25
Maintainer: active
Quality score: 3.2/10

Summary

mini-swe-agent — Summary

Overview

mini-swe-agent — Overview

Origin

Philosophy

The Core Insight

Manifesto quote

LOC Count

Architecture

mini-swe-agent — Architecture

Distribution & Install

Required Runtime

Source Structure

Agent Simplicity Architecture

Tool Surface: Bash Only

Python SDK

Components

mini-swe-agent — Components

Core Classes (minimal by design)

DefaultAgent Methods

Execution Environments

Config Files (Jinja2 YAML)

CLI: mini

Trajectory Browser

Prompts

mini-swe-agent — Prompts

Prompt Format

Verbatim Excerpt 1: System Template (from config/default.yaml)

Verbatim Excerpt 2: Instance Template (recommended workflow + format)

Uniqueness

mini-swe-agent — Uniqueness & Positioning

Differs from Seeds

Positioning

Key Differentiators

Observable Failure Modes

Workflow

mini-swe-agent — Workflow

Agent Loop (from DefaultAgent.run)

Phases & Artifacts

Required Response Format

Approval Gates

Limits

SWE-bench Evaluation Mode

Memory Context

mini-swe-agent — Memory & Context

Memory Architecture: Linear History Only

What Goes in Messages

Trajectory Saving

Context Window Management

Cross-Session Handoff

Orchestration

mini-swe-agent — Orchestration

Multi-Agent

Orchestration Pattern

Execution Mode

Isolation Mechanism

Why Stateless Subprocess Matters

Multi-Model

Prompt Chaining

Ui Cli Surface

mini-swe-agent — UI & CLI Surface

Dedicated CLI Binary

Interactive CLI (mini)

Trajectory Browser

Batch Inference Mode

Python SDK

Programmatic Extensibility

ProgramBench Integration

Related frameworks

`DefaultAgent` Methods

CLI: `mini`

Verbatim Excerpt 1: System Template (from `config/default.yaml`)

Interactive CLI (`mini`)