Skip to content
/

gpt-engineer

gpt-engineer · gpt-engineer-org/gpt-engineer · ★ 55k · last commit 2025-05-14

Primitive shape
No installable primitives
00

Summary

gpt-engineer — Summary

gpt-engineer is the original "write a full app from a prompt file" CLI tool, created by Anton Osika in 2023 as an experiment in LLM-driven code generation. It is now archived (last push May 2025) — the README itself says: "If you are looking for a well maintained hackable CLI for – check out aider." It pioneered the pattern of using a prompt text file as input, running LLM through multiple stages (clarify → generate → execute), and producing a complete codebase. The preprompts/ directory is the customization hook — organizations override it with custom instructions, making the AI "remember things between projects." It is the precursor to Lovable.dev (gptengineer.app).

gpt-engineer is historically significant as one of the first "write a whole app from scratch" agents, but it is explicitly obsolete — the maintainers direct users to aider. It differs from all seed frameworks in that it generates complete projects from scratch, not incremental changes; it has no concept of an existing codebase or iterative editing.

01

Overview

gpt-engineer — Overview

Origin

Created by Anton Osika (antonosika on GitHub) in 2023. The project went viral as one of the first demonstrations that GPT-4 could generate entire applications from a prose specification. It has 55,000+ stars despite being archived.

Current Status

ARCHIVED — the repo is marked archived as of 2025. From the README:

"The OG code generation experimentation platform! If you are looking for the evolution that is an opinionated, managed service – check out gptengineer.app. If you are looking for a well maintained hackable CLI for – check out aider."

The project explicitly directs users to two successors:

  1. Lovable.dev (gptengineer.app) — commercial web app generation service
  2. Aider — maintained CLI for iterative AI coding

Philosophy

gpt-engineer's original philosophy:

  1. Clarify before generating: If the spec is ambiguous, ask ONE clarifying question before proceeding
  2. Complete implementation: "No placeholders." The generated code should be fully functional.
  3. Entrypoint-first: Start with the entrypoint file, then work through imports
  4. Cross-file compatibility: All files must be compatible with each other

From the generate preprompt:

"Think step by step and reason yourself to the correct decisions to make sure we get it right." "Please note that the code should be fully functional. No placeholders."

Key Innovation

gpt-engineer introduced the preprompts/ customization pattern: a directory of plain text files that define the agent's behavior. Organizations can override this directory (via --use-custom-preprompts) to customize what the agent "remembers" about their preferences and technology stack. This pattern influenced many later frameworks.

Mission

From README:

"The gpt-engineer community mission is to maintain tools that coding agent builders can use and facilitate collaboration in the open source community."

02

Architecture

gpt-engineer — Architecture

Distribution

ARCHIVED — but was available as:

pip install gpt-engineer
# or
poetry install  # from source

CLI Binaries

From pyproject.toml:

  • gpte — main code generation command
  • bench — benchmarking interface

Directory Layout

gpt_engineer/
  core/
    ai.py            # LLM client abstraction
    base_agent.py    # Base agent class
    preprompts_holder.py  # Manages preprompts directory
    default/         # Default agent implementations
    files_dict.py    # File collection type
    chat_to_files.py # Parse LLM output to files
    linting.py       # Lint integration
    git.py           # Git integration
  applications/
    cli.py           # CLI entrypoint
  benchmark/         # APPS and MBPP benchmark support
  preprompts/        # Customizable prompt files
    clarify          # Clarification prompt
    generate         # Main generation prompt
    improve          # Improvement prompt
    philosophy       # Coding philosophy/preferences
    file_format      # File format specification
    file_format_diff # Diff format specification
    entrypoint       # Entrypoint generation prompt
    file_format_fix  # Fix format specification
projects/            # Example project directories (user's prompts go here)

Usage Pattern

# Create project directory with prompt file
mkdir my-project && echo "Build a Flask web app with user auth" > my-project/prompt

# Generate code
gpte my-project

# Improve existing code
gpte my-project -i

Required Runtime

  • Python 3.10-3.12
  • OpenAI API key (or Anthropic, or local model with extra setup)

Target AI Tools

  • OpenAI GPT-4 (primary, recommended)
  • Anthropic Claude (supported)
  • Azure OpenAI API
  • Local models (via additional setup, documented separately)

Primary: OpenAI GPT-4

03

Components

gpt-engineer — Components

CLI Commands

Command Purpose
gpte <project_dir> Generate new code from prompt file
gpte <project_dir> -i Improve existing code using prompt instructions
gpte <project_dir> --use-custom-preprompts Use custom preprompts from project dir
gpte <project_dir> --image_directory <dir> <model> Vision input for architecture diagrams
bench Benchmark agents against APPS/MBPP datasets

Preprompts (Core Customization Point)

File Purpose
clarify "Determine if anything needs to be clarified"
generate Main code generation instructions (step-by-step, FIM-style)
improve Code improvement instructions
philosophy Coding preferences (language conventions, file structure)
file_format How to format generated file output
file_format_diff How to format diff output
file_format_fix How to format fix output
entrypoint How to generate the entrypoint file

Core Classes

Class Purpose
AI LLM client abstraction (OpenAI, Anthropic, Azure)
BaseAgent Base class for agent implementations
PrepromptsHolder Manages the preprompts directory, loads custom overrides
FilesDict Dictionary of filename → content for generated files
chat_to_files.py Parses LLM markdown output into file dictionaries
Linting Optional post-generation lint step

Benchmarking

The bench CLI and benchmark/ module support:

  • APPS benchmark (competitive programming)
  • MBPP benchmark (basic programming problems)
  • Template for custom agent benchmarking

Community contributed template: gpt-engineer-org/gpte-bench-template

Custom Agents

gpt-engineer has a BaseAgent abstraction that allows building custom agent implementations. The benchmark tooling is designed around this abstraction.

05

Prompts

gpt-engineer — Prompts

Prompt 1: Clarification Prompt

Source: gpt_engineer/preprompts/clarify

Technique: Minimal clarification gate — asks exactly ONE question if ambiguous. The "Nothing to clarify" escape prevents unnecessary questions.

Given some instructions, determine if anything needs to be clarified, do not carry them out.
You can make reasonable assumptions, but if you are unsure, ask a single clarification question.
Otherwise state: "Nothing to clarify"

This is a masterclass in minimal prompting — 3 lines that implement a gating mechanism. The key constraint is "a single clarification question" — preventing the common failure of agents that ask 10 questions before starting.


Prompt 2: Main Generation Prompt

Source: gpt_engineer/preprompts/generate

Technique: Step-by-step generation with completeness mandate and explicit failure mode prevention ("No placeholders")

Think step by step and reason yourself to the correct decisions to make sure we get it right.
First lay out the names of the core classes, functions, methods that will be necessary, As well as a quick comment on their purpose.

FILE_FORMAT

You will start with the "entrypoint" file, then go to the ones that are imported by that file, and so on.
Please note that the code should be fully functional. No placeholders.

Follow a language and framework appropriate best practice file naming convention.
Make sure that files contain all imports, types etc.  The code should be fully functional. Make sure that code in different files are compatible with each other.
Ensure to implement all code, if you are unsure, write a plausible implementation.
Include module dependency or package manager dependency definition file.
Before you finish, double check that all parts of the architecture is present in the files.

When you are done, write finish with "this concludes a fully working implementation".

Note the FILE_FORMAT placeholder — this is dynamically replaced with the file format specification from gpt_engineer/preprompts/file_format.

The "this concludes a fully working implementation" ending is a completion signal that the agent must produce.


Prompt 3: Philosophy / Coding Preferences

Source: gpt_engineer/preprompts/philosophy

Technique: Persistent preferences injection — the "remember things between projects" mechanism.

Almost always put different classes in different files.
Always use the programming language the user asks for.
For Python, you always create an appropriate requirements.txt file.
For NodeJS, you always create an appropriate package.json file.
Always add a comment briefly describing the purpose of the function definition.
Add comments explaining very complex bits of logic.
Always follow the best practices for the requested languages for folder/file structure and how to package the project.

Python toolbelt preferences:
- pytest
- dataclasses

Organizations override this file with their own preferences.


Prompting Techniques Used

  1. Staged generation: clarify → plan → generate → verify — explicit phases prevent jumping to code
  2. Minimal clarification gate: "ask at most ONE question or say Nothing to clarify" — prevents paralysis
  3. Completeness mandate: "No placeholders" + "fully functional" enforced in prompt, not just documentation
  4. Entrypoint-first ordering: generation follows import graph, not arbitrary file order
  5. Completion signal: "this concludes a fully working implementation" — a termination signal the runner can detect
  6. Preferences file: Persistent coding philosophy/preferences separate from per-request instructions
09

Uniqueness

gpt-engineer — Uniqueness and Positioning

Differs from Seeds

gpt-engineer is historically unique as the original "write a whole codebase from a prompt file" tool. It differs from all seed frameworks (which work with existing codebases) in that it generates complete projects from scratch. All seed frameworks assume you have an existing codebase to modify; gpt-engineer starts from a prompt file and writes everything. The preprompts customization pattern (override a directory of plain text files to encode organizational preferences) influenced later frameworks — agent-os's standards/ directory and BMAD-method's persona files are distant descendants. However, gpt-engineer is now ARCHIVED and the maintainers direct users to aider (iterative editing) and Lovable.dev (commercial equivalent).

Historical Significance

  1. First viral LLM coding agent (55,000+ stars as an archived repo): demonstrated that GPT-4 could generate entire applications from prose
  2. Preprompts pattern: the preprompts/ directory as a customization mechanism influenced many later frameworks
  3. Clarify-then-generate: the single-clarifying-question gate is widely copied
  4. "This concludes a fully working implementation": completion signal pattern influenced later prompting techniques
  5. Precursor to Lovable.dev: the commercial evolution of gpt-engineer is now a multi-hundred-million dollar product

Observable Failure Modes (as documented)

  1. ARCHIVED: explicitly unmaintained; do not use for production
  2. Context limit failures: generating large apps exceeded GPT-4's context window
  3. Hallucinated dependencies: generated requirements.txt often included packages that didn't exist or had wrong versions
  4. No iterative editing: every re-run regenerated from scratch; no surgical edits
  5. Single clarification question not enough: complex specs needed multiple rounds of clarification
  6. Generated code quality degrades with app size: works well for simple apps, poorly for complex ones
04

Workflow

gpt-engineer — Workflow

New Project Workflow

  1. Create project directory with a prompt file
  2. Run gpte <project_dir>
  3. Clarification phase — LLM reads prompt, asks at most ONE clarifying question if ambiguous
  4. Generation phase — LLM generates complete codebase:
    • Lists core classes, functions, methods
    • Starts with entrypoint file
    • Works through imports
    • Generates all files
  5. Execution phase — optionally executes the entrypoint to verify
  6. Output — complete codebase in project directory

Improve Workflow

  1. Existing codebase in project directory
  2. Update prompt file with improvement instructions
  3. Run gpte <project_dir> -i
  4. LLM reads existing code + new instructions
  5. Generates diff — only the necessary changes
  6. Output — modified files

Vision Workflow

gpte projects/example-vision gpt-4-vision-preview \
  --prompt_file prompt/text \
  --image_directory prompt/images -i

Adds architecture diagrams or screenshots as visual context.

Benchmark Workflow

bench --config my-agent-config.yaml --dataset apps

Runs agent against APPS or MBPP benchmark datasets.

Phases + Artifacts Table

Phase Artifact
Clarification Clarifying question (optional)
Core planning Class/function/method list (in response)
Code generation Complete files in project directory
Entrypoint generation Runnable entrypoint script

Approval Gates

Gate Type Notes
Clarification answer freetext-clarify User answers the one clarifying question
Execution consent yes-no User asked before running generated code
06

Memory Context

gpt-engineer — Memory and Context

State Storage

State Storage Scope
Project prompt <project_dir>/prompt Project
Generated files <project_dir>/ Project
Conversation history In-memory during session Session only
Custom preprompts <project_dir>/preprompts/ (with --use-custom-preprompts) Project

Preprompts as Memory

The preprompts/ directory IS the memory mechanism — it's how "the agent remembers things between projects." Organizations maintain a custom preprompts/ directory that encodes:

  • Preferred languages and frameworks
  • File structure conventions
  • Tool preferences (pytest, dataclasses, etc.)
  • Code style guidelines

This is passed via --use-custom-preprompts, pointing to the custom directory.

No Cross-Session State

gpt-engineer has no database, no vector store, no persistent session history beyond the files in the project directory. Each run is independent except for the shared preprompts.

Context Window

The entire prompt + generated code is in a single LLM conversation. For large codebases, this can exceed context limits. gpt-engineer does not implement context compaction.

Improve Mode Memory

In improve mode (-i), gpt-engineer reads the existing files in the project directory as context. The existing codebase IS the memory.

07

Orchestration

gpt-engineer — Orchestration

Multi-Agent

No. gpt-engineer is a single LLM call chain (clarify → generate → execute). No subagent spawning.

Orchestration Pattern

Sequential: clarification → planning → generation → execution. Fixed pipeline, no branching.

Isolation

None. Files written directly to the project directory.

Multi-Model

No. Single model per run. Configurable via environment variable or CLI argument.

Execution Mode

One-shot: single run per gpte invocation. No interactive loop.

Crash Recovery

None. If generation fails, re-run from scratch.

Context Compaction

None. Single conversation context throughout generation.

Consensus Mechanism

None.

Prompt Chaining

Yes: the clarification response feeds into the generation prompt. The generation output (class/function list) feeds into actual code generation. These are implicit — not named stages with explicit outputs.

08

Ui Cli Surface

gpt-engineer — UI and CLI Surface

CLI Binaries

gpte — main code generation command
bench — benchmark runner

No UI

gpt-engineer has no web dashboard, no IDE extension, no TUI. Pure CLI with stdout output.

Key Flags

gpte <project_dir>               # Generate new project
gpte <project_dir> -i            # Improve existing project
gpte <project_dir> --use-custom-preprompts  # Use custom preprompts
gpte <project_dir> --image_directory <dir> <model>  # Vision input

Observability

  • stdout only
  • Generated files in project directory

GitHub Codespaces

The repo offers a "Open in GitHub Codespaces" badge — this was the intended browser-based experience for users who didn't want to install locally.

Related frameworks

same archetype · same primary tool · same memory type

Claude-Flow / Ruflo ★ 55k

Eliminates single-agent context limits and sequential bottlenecks by orchestrating fault-tolerant swarms of specialized AI agents…

Hermes Agent (NousResearch) ★ 168k

Self-improving personal AI agent with closed learning loop, 7 terminal backends, and messaging gateway — not tied to any AI…

OpenCode ★ 165k

Terminal-first AI coding agent with multi-model routing, native desktop app, and a typed .opencode/ configuration system for…

OpenHands ★ 75k

Open-source AI software development platform (open-source Devin alternative) with Docker sandbox isolation, 77.6% SWE-bench…

DeerFlow ★ 70k

Long-horizon superagent that researches, codes, and creates by orchestrating parallel sub-agents with isolated contexts in Docker…

oh-my-openagent (omo) ★ 60k

Multi-provider AI agent orchestration for OpenCode: escape vendor lock-in by routing Sisyphus (Claude/Kimi/GLM) and Hephaestus…