Skip to content
/

mini-swe-agent

mini-swe-agent · SWE-agent/mini-swe-agent · ★ 4.5k · last commit 2026-05-25

Primitive shape
No installable primitives
00

Summary

mini-swe-agent — Summary

mini-swe-agent is a deliberately minimal Python coding agent from the Princeton/Stanford SWE-bench team that answers the question: "What if our agent was 100x simpler, and still worked nearly as well?" The core DefaultAgent class is ~130 lines of Python, uses only bash as a tool (no file-specific tools), executes each action with subprocess.run (stateless — no persistent shell session), maintains a completely linear message history, and supports all LLMs via litellm. Despite this radical simplicity, it scores >74% on SWE-bench verified with Gemini 3 Pro. It ships a CLI (mini), multiple sandboxed execution environments (Docker, Singularity, bubblewrap, contree), a trajectory browser UI (Textual TUI), and batch evaluation mode. The intentional minimalism serves two purposes: (1) a clean research baseline that isolates LLM capability from agent scaffold; (2) a hackable daily-use tool.

Compared to seeds: no seed framework approaches this level of intentional minimalism. The closest is agent-os (zero-primitive scaffold), but mini-swe-agent is a fully functional agent with 74% SWE-bench performance in ~100 lines. The architectural insight — single bash tool + stateless subprocess execution + linear history — is unique in the entire corpus.

01

Overview

mini-swe-agent — Overview

Origin

Created by the Princeton & Stanford team behind SWE-bench and SWE-agent. Repository: SWE-agent/mini-swe-agent. MIT license. ~4526 GitHub stars. Actively maintained (last push: 2026-05-25).

Philosophy

"What if our agent was 100x simpler, and still worked nearly as well?"

Key design decisions (from README):

"Does not have any tools other than bash — it doesn't even need to use the tool-calling interface of the LMs."

"Executes actions with subprocess.run — every action is completely independent (as opposed to keeping a stateful shell session running). This makes it trivial to execute the actions in sandboxes."

"Has a completely linear history — every step of the agent just appends to the messages and that's it. So there's no difference between the trajectory and the messages that you pass on to the LM. Great for debugging & fine-tuning."

The Core Insight

"As LMs have become more capable, a lot of [the tools and special interfaces] is not needed at all to build a useful agent!"

The agent removes complexity from the scaffold and places the cognitive burden entirely on the LLM. This makes it:

  • Easier to debug (linear history = perfect observability)
  • Easier to fine-tune (trajectory = training data in the exact format needed)
  • Easier to sandbox (stateless subprocess = just replace subprocess.run with docker exec)
  • Easier to reason about (no state machine, no tool registry, no history processors)

Manifesto quote

"Some agents are overfitted research artifacts. Others are UI-heavy frontend monsters. The mini agent wants to be a hackable tool, not a black box."

LOC Count

  • DefaultAgent class: ~130 lines (src/minisweagent/agents/default.py)
  • LocalEnvironment: ~100 lines
  • LitellmModel: ~100 lines
  • hello_world.py run script: ~50 lines
  • Total core: ~400 lines of Python
02

Architecture

mini-swe-agent — Architecture

Distribution & Install

  • pip install mini-swe-agent
  • Binary: mini (CLI)
  • Also available for batch evaluation via Python bindings

Required Runtime

  • Python (version from pyproject.toml)
  • litellm (for universal LLM provider support)
  • Optional sandbox environments:
    • Docker / Podman
    • Singularity / Apptainer
    • bubblewrap
    • contree

Source Structure

src/minisweagent/
├── agents/
│   └── default.py          # DefaultAgent class (~130 lines) — THE core
├── environments/
│   └── local.py            # LocalEnvironment class
│   (+ docker, singularity, bubblewrap, contree environments)
├── models/
│   └── litellm_model.py    # LitellmModel class
├── run/
│   └── hello_world.py      # CLI entry point script
├── config/
│   ├── default.yaml        # Default config with system + instance templates
│   ├── mini.yaml           # Interactive CLI config
│   └── benchmarks/         # SWE-bench evaluation configs
└── utils/

Agent Simplicity Architecture

DefaultAgent
├── messages: list[dict]     # Linear history — only data structure
├── model: Model             # Any litellm-supported model
├── env: Environment         # Local or sandboxed
├── config: AgentConfig
│   ├── system_template      # Jinja2 template for system message
│   ├── instance_template    # Jinja2 template for first user message
│   ├── step_limit           # Hard stop
│   └── cost_limit           # Cost-based stop
└── run() → step() → query() + execute_actions()

Tool Surface: Bash Only

The agent has exactly ONE tool: bash. No file-read tool, no search tool, no diff tool — just bash. The model must figure out cat, grep, sed, etc. on its own.

Python SDK

agent = DefaultAgent(
    LitellmModel(model_name=...),
    LocalEnvironment(),
)
agent.run("Write a sudoku game")
03

Components

mini-swe-agent — Components

Core Classes (minimal by design)

Class File LOC Purpose
DefaultAgent agents/default.py ~130 The complete agent loop: query → execute → repeat
LocalEnvironment environments/local.py ~100 Runs bash commands via subprocess.run
LitellmModel models/litellm_model.py ~100 Universal LLM interface via litellm
hello_world.py run/hello_world.py ~50 CLI entry point

DefaultAgent Methods

Method Purpose
run(task) Outer loop: initialize messages, call step() until exit
step() Single agent step: query() + execute_actions()
query() Call model, append message, check limits
execute_actions() Extract bash commands from response, run via env
serialize() Serialize full state to JSON for trajectory saving
save() Write trajectory to disk

Execution Environments

Environment How it works
LocalEnvironment subprocess.run(command, shell=True) — stateless, isolated per call
Docker/Podman docker exec replaces subprocess.run
Singularity/Apptainer container exec
bubblewrap Linux user namespace sandboxing
contree filesystem tree isolation

Config Files (Jinja2 YAML)

File Purpose
config/default.yaml system_template + instance_template for default agent behavior
config/mini.yaml Configuration for the mini interactive CLI
config/benchmarks/ Configurations for SWE-bench evaluation

CLI: mini

The mini CLI binary provides interactive use with the default configuration. A swebench mode exists for batch evaluation.

Trajectory Browser

A Textual (Python TUI) browser for viewing agent trajectories — shows messages, tool calls, costs, and outcomes. Mentioned as a key feature alongside the CLI.

05

Prompts

mini-swe-agent — Prompts

Prompt Format

System and instance templates are Jinja2 YAML strings in config files. The templates use {{variable}} interpolation from get_template_vars().

Verbatim Excerpt 1: System Template (from config/default.yaml)

agent:
  system_template: |
    You are a helpful assistant that can interact with a computer.

    Your response must contain exactly ONE bash code block with ONE command (or commands connected with && or ||).
    Include a THOUGHT section before your command where you explain your reasoning process.
    Format your response as shown in <format_example>.

    <format_example>
    Your reasoning and analysis here. Explain why you want to perform the action.

    ```mswea_bash_command
    your_command_here
    ```
    </format_example>

    Failure to follow these rules will cause your response to be rejected.

Prompting technique: Strict format constraint with a concrete example (<format_example>) and an explicit consequence for non-compliance ("will cause your response to be rejected"). The THOUGHT section enforces chain-of-thought before action — similar to ReAct but with a mandatory structural constraint on the output format.

  instance_template: |
    Please solve this issue: {{task}}

    ## Recommended Workflow

    1. Analyze the codebase by finding and reading relevant files
    2. Create a script to reproduce the issue
    3. Edit the source code to resolve the issue
    4. Verify your fix works by running your script again
    5. Test edge cases to ensure your fix is robust
    6. Submit your changes and finish your work by issuing the following command: `echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT`.
       Do not combine it with any other command. <important>After this command, you cannot continue working on this task.</important>

    ## Important Rules

    1. Every response must contain exactly one action
    2. The action must be enclosed in triple backticks
    3. Directory or environment variable changes are not persistent. Every action is executed in a new subshell.
       However, you can prefix any action with `MY_ENV_VAR=MY_VALUE cd /path/to/working/dir && ...`

Prompting technique: Step-by-step recommended workflow (not a hard constraint) + critical rules that ARE constraints. The "not persistent" note is a critical correctness hint that prevents a common failure mode. The completion command (echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT) is a deliberate, non-compositional sentinel that can't be accidentally combined with other commands.

09

Uniqueness

mini-swe-agent — Uniqueness & Positioning

Differs from Seeds

No seed framework approaches this level of intentional minimalism. The closest is agent-os (zero primitives, markdown scaffolding) — but agent-os has no agent execution capability at all. mini-swe-agent is the only framework in the entire corpus that treats radical simplicity as the explicit engineering thesis and demonstrates it achieves near-state-of-the-art benchmark performance (>74% SWE-bench verified). The key deltas from all seeds: (1) single bash tool — no file-read, no diff, no search tools; (2) stateless subprocess execution — no persistent shell session; (3) linear message history — the trajectory IS the message history, no abstraction layer; (4) designed to be a fine-tuning data source — the trajectory format is training-ready; (5) multiple sandbox backends interchangeable via a single code change.

Positioning

The canonical "what is the minimum viable coding agent?" reference implementation. Used by Meta, NVIDIA, Essential AI, IBM, and universities as a baseline for agent research. Not primarily a production tool — it is a research artifact and a hackable starting point.

Key Differentiators

  1. ~130 lines for the agent class — intentionally minimal, documented as a design goal
  2. Only bash — no tool registry; forces LLM to use shell creatively
  3. Stateless subprocess — each command in a fresh subshell; makes sandboxing trivial
  4. 74% SWE-bench verified — competitive with much more complex agents
  5. Linear trajectory = fine-tuning data — trajectory format is training-ready
  6. Multiple sandbox environments — local, Docker, Singularity, bwrap, contree

Observable Failure Modes

  • No persistence: cd doesn't persist — agents must prefix each command; this confuses models occasionally
  • Context window growth: no compaction; long sessions may exceed context limits
  • Single model limitation: no expert escalation or multi-model routing
  • No recovery mechanism: if the agent writes bad code and tests fail, there's no rollback
  • Intentional scope: not designed for interactive coding sessions — best for well-defined issue-fixing tasks
04

Workflow

mini-swe-agent — Workflow

Agent Loop (from DefaultAgent.run)

run(task)
  → messages = [system_message, user_message]
  → while True:
      step()
        → query()             # model.query(messages) → response
                              # append response to messages
        → execute_actions()   # extract bash blocks → env.execute() × N
                              # append observation messages
      if messages[-1]["role"] == "exit":
          break
  → return last_message["extra"]

Phases & Artifacts

Phase Action Artifact
Init Render system + instance templates Messages list
Query Send messages to LLM Model response with bash code block
Action extraction Parse mswea_bash_command fenced block from response Command string
Execution subprocess.run(command) (or docker exec) stdout/stderr
Observation Format output as observation message Appended to messages
Repeat Continue until COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT or limit
Save Write trajectory JSON output_path.json

Required Response Format

The agent's system prompt mandates exactly ONE bash code block per response:

THOUGHT: Your reasoning here.

```mswea_bash_command
your_command_here

## Completion Signal

The agent submits by running:
```bash
echo COMPLETE_TASK_AND_SUBMIT_FINAL_OUTPUT

Approval Gates

None — the agent runs autonomously with no human approval prompts.

Limits

  • step_limit: max number of LLM calls (default: no limit if 0)
  • cost_limit: stop when cumulative cost exceeds threshold (default: $3.00)
  • wall_time_limit_seconds: time-based stop (default: no limit if 0)

SWE-bench Evaluation Mode

Batch mode: run the agent against SWE-bench instances, collect trajectories, evaluate patches.

06

Memory Context

mini-swe-agent — Memory & Context

Memory Architecture: Linear History Only

The agent has no external memory, no database, no vector store. The ONLY state is self.messages: list[dict] — a linear list of messages that grows with each step.

"Has a completely linear history — every step of the agent just appends to the messages and that's it. So there's no difference between the trajectory and the messages that you pass on to the LM."

What Goes in Messages

  1. System message (rendered from system_template)
  2. User message (rendered from instance_template with task)
  3. Model response (with THOUGHT + bash command)
  4. Observation message (bash output)
  5. Repeat from step 3...
  6. Exit message (when task complete or limits exceeded)

Trajectory Saving

The entire message list IS the trajectory. agent.save(output_path) writes a JSON file containing:

  • All messages
  • Model stats (cost, API calls)
  • Agent config
  • Exit status and submission
  • mini_version

This makes the trajectory both an audit log and a training dataset — the exact format needed for fine-tuning.

Context Window Management

No explicit compaction. The model sees the full linear history on each call. The cost_limit and step_limit fields prevent unbounded growth by stopping the agent when resources are exhausted.

Cross-Session Handoff

Not supported — each run() starts fresh with an empty messages list. Trajectories can be manually reviewed between runs.

07

Orchestration

mini-swe-agent — Orchestration

Multi-Agent

No. The DefaultAgent is a single-agent loop. No subagents, no parallelism, no orchestrator.

Orchestration Pattern

None — single sequential loop.

Execution Mode

One-shot (interactive via mini CLI) or batch (SWE-bench evaluation mode).

Isolation Mechanism

Multiple — this is a key design feature:

Environment Isolation
LocalEnvironment Process-level (subprocess.run, separate subshell per command)
Docker/Podman Container
Singularity/Apptainer Container
bubblewrap Linux user namespace + filesystem namespacing
contree Filesystem tree isolation

The stateless subprocess execution (subprocess.run) makes switching between these trivial — just change the executor. No persistent shell state to maintain across isolation boundaries.

Why Stateless Subprocess Matters

"This is a big deal, trust me."

Each command runs in a fresh subshell. Benefits:

  • Trivially sandboxable (just replace subprocess.run)
  • No shell state corruption across commands
  • Works in any isolation environment identically
  • cd and export don't persist — agents learn to prefix commands

Multi-Model

litellm supports all major LLM providers. Model selection is a config parameter. No multi-model routing per se — one model per run.

Prompt Chaining

Linear message chain — each step's output IS the input for the next step (via message history). Simple, explicit, debuggable.

08

Ui Cli Surface

mini-swe-agent — UI & CLI Surface

Dedicated CLI Binary

  • Binary name: mini
  • Not a thin wrapper: full agent runtime in Python
  • Install: pip install mini-swe-agent

Interactive CLI (mini)

The mini CLI provides an interactive mode for daily use. It runs the DefaultAgent with the mini.yaml configuration.

Trajectory Browser

  • Type: terminal-tui
  • Stack: Textual (Python TUI framework)
  • Purpose: Browse and analyze completed agent trajectories
  • Features: message viewer, tool call inspection, cost display, exit status

Batch Inference Mode

  • Binary: swebench (or similar)
  • Purpose: Run the agent against SWE-bench instances in batch for evaluation
  • Shows a visual progress display during evaluation runs

Python SDK

from minisweagent import DefaultAgent, LitellmModel, LocalEnvironment

agent = DefaultAgent(
    LitellmModel(model_name="gpt-4o"),
    LocalEnvironment(),
)
result = agent.run("Fix the failing test in src/foo.py")

Programmatic Extensibility

The DefaultAgent is explicitly designed for subclassing. Key override points:

  • query() — add pre/post hooks
  • execute_actions() — custom action handling
  • handle_uncaught_exception() — custom error handling
  • config_class parameter — swap in custom AgentConfig

ProgramBench Integration

New benchmark mentioned in README (as of May 2026): mini-swe-agent runs on ProgramBench via a dedicated evaluation mode.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.