gpt-engineer

gpt-engineer · gpt-engineer-org/gpt-engineer · ★ 55k · last commit 2025-05-14

Primitive shape

No installable primitives

Summary

gpt-engineer — Summary

gpt-engineer is the original "write a full app from a prompt file" CLI tool, created by Anton Osika in 2023 as an experiment in LLM-driven code generation. It is now archived (last push May 2025) — the README itself says: "If you are looking for a well maintained hackable CLI for – check out aider." It pioneered the pattern of using a prompt text file as input, running LLM through multiple stages (clarify → generate → execute), and producing a complete codebase. The preprompts/ directory is the customization hook — organizations override it with custom instructions, making the AI "remember things between projects." It is the precursor to Lovable.dev (gptengineer.app).

gpt-engineer is historically significant as one of the first "write a whole app from scratch" agents, but it is explicitly obsolete — the maintainers direct users to aider. It differs from all seed frameworks in that it generates complete projects from scratch, not incremental changes; it has no concept of an existing codebase or iterative editing.

Overview

gpt-engineer — Overview

Origin

Created by Anton Osika (antonosika on GitHub) in 2023. The project went viral as one of the first demonstrations that GPT-4 could generate entire applications from a prose specification. It has 55,000+ stars despite being archived.

Current Status

ARCHIVED — the repo is marked archived as of 2025. From the README:

"The OG code generation experimentation platform! If you are looking for the evolution that is an opinionated, managed service – check out gptengineer.app. If you are looking for a well maintained hackable CLI for – check out aider."

The project explicitly directs users to two successors:

Lovable.dev (gptengineer.app) — commercial web app generation service
Aider — maintained CLI for iterative AI coding

Philosophy

gpt-engineer's original philosophy:

Clarify before generating: If the spec is ambiguous, ask ONE clarifying question before proceeding
Complete implementation: "No placeholders." The generated code should be fully functional.
Entrypoint-first: Start with the entrypoint file, then work through imports
Cross-file compatibility: All files must be compatible with each other

From the generate preprompt:

"Think step by step and reason yourself to the correct decisions to make sure we get it right." "Please note that the code should be fully functional. No placeholders."

Key Innovation

gpt-engineer introduced the preprompts/ customization pattern: a directory of plain text files that define the agent's behavior. Organizations can override this directory (via --use-custom-preprompts) to customize what the agent "remembers" about their preferences and technology stack. This pattern influenced many later frameworks.

Mission

From README:

"The gpt-engineer community mission is to maintain tools that coding agent builders can use and facilitate collaboration in the open source community."

Architecture

gpt-engineer — Architecture

Distribution

ARCHIVED — but was available as:

pip install gpt-engineer
# or
poetry install  # from source

CLI Binaries

From pyproject.toml:

gpte — main code generation command
bench — benchmarking interface

Directory Layout

gpt_engineer/
  core/
    ai.py            # LLM client abstraction
    base_agent.py    # Base agent class
    preprompts_holder.py  # Manages preprompts directory
    default/         # Default agent implementations
    files_dict.py    # File collection type
    chat_to_files.py # Parse LLM output to files
    linting.py       # Lint integration
    git.py           # Git integration
  applications/
    cli.py           # CLI entrypoint
  benchmark/         # APPS and MBPP benchmark support
  preprompts/        # Customizable prompt files
    clarify          # Clarification prompt
    generate         # Main generation prompt
    improve          # Improvement prompt
    philosophy       # Coding philosophy/preferences
    file_format      # File format specification
    file_format_diff # Diff format specification
    entrypoint       # Entrypoint generation prompt
    file_format_fix  # Fix format specification
projects/            # Example project directories (user's prompts go here)

Usage Pattern

# Create project directory with prompt file
mkdir my-project && echo "Build a Flask web app with user auth" > my-project/prompt

# Generate code
gpte my-project

# Improve existing code
gpte my-project -i

Required Runtime

Python 3.10-3.12
OpenAI API key (or Anthropic, or local model with extra setup)

Target AI Tools

OpenAI GPT-4 (primary, recommended)
Anthropic Claude (supported)
Azure OpenAI API
Local models (via additional setup, documented separately)

Primary: OpenAI GPT-4

Components

gpt-engineer — Components

CLI Commands

Command	Purpose
`gpte <project_dir>`	Generate new code from `prompt` file
`gpte <project_dir> -i`	Improve existing code using `prompt` instructions
`gpte <project_dir> --use-custom-preprompts`	Use custom preprompts from project dir
`gpte <project_dir> --image_directory <dir> <model>`	Vision input for architecture diagrams
`bench`	Benchmark agents against APPS/MBPP datasets

Preprompts (Core Customization Point)

File	Purpose
`clarify`	"Determine if anything needs to be clarified"
`generate`	Main code generation instructions (step-by-step, FIM-style)
`improve`	Code improvement instructions
`philosophy`	Coding preferences (language conventions, file structure)
`file_format`	How to format generated file output
`file_format_diff`	How to format diff output
`file_format_fix`	How to format fix output
`entrypoint`	How to generate the entrypoint file

Core Classes

Class	Purpose
`AI`	LLM client abstraction (OpenAI, Anthropic, Azure)
`BaseAgent`	Base class for agent implementations
`PrepromptsHolder`	Manages the preprompts directory, loads custom overrides
`FilesDict`	Dictionary of filename → content for generated files
`chat_to_files.py`	Parses LLM markdown output into file dictionaries
`Linting`	Optional post-generation lint step

Benchmarking

The bench CLI and benchmark/ module support:

APPS benchmark (competitive programming)
MBPP benchmark (basic programming problems)
Template for custom agent benchmarking

Community contributed template: gpt-engineer-org/gpte-bench-template

Custom Agents

gpt-engineer has a BaseAgent abstraction that allows building custom agent implementations. The benchmark tooling is designed around this abstraction.

Prompts

gpt-engineer — Prompts

Prompt 1: Clarification Prompt

Source: gpt_engineer/preprompts/clarify

Technique: Minimal clarification gate — asks exactly ONE question if ambiguous. The "Nothing to clarify" escape prevents unnecessary questions.

Given some instructions, determine if anything needs to be clarified, do not carry them out.
You can make reasonable assumptions, but if you are unsure, ask a single clarification question.
Otherwise state: "Nothing to clarify"

This is a masterclass in minimal prompting — 3 lines that implement a gating mechanism. The key constraint is "a single clarification question" — preventing the common failure of agents that ask 10 questions before starting.

Prompt 2: Main Generation Prompt

Source: gpt_engineer/preprompts/generate

Technique: Step-by-step generation with completeness mandate and explicit failure mode prevention ("No placeholders")

Think step by step and reason yourself to the correct decisions to make sure we get it right.
First lay out the names of the core classes, functions, methods that will be necessary, As well as a quick comment on their purpose.

FILE_FORMAT

You will start with the "entrypoint" file, then go to the ones that are imported by that file, and so on.
Please note that the code should be fully functional. No placeholders.

Follow a language and framework appropriate best practice file naming convention.
Make sure that files contain all imports, types etc.  The code should be fully functional. Make sure that code in different files are compatible with each other.
Ensure to implement all code, if you are unsure, write a plausible implementation.
Include module dependency or package manager dependency definition file.
Before you finish, double check that all parts of the architecture is present in the files.

When you are done, write finish with "this concludes a fully working implementation".

Note the FILE_FORMAT placeholder — this is dynamically replaced with the file format specification from gpt_engineer/preprompts/file_format.

The "this concludes a fully working implementation" ending is a completion signal that the agent must produce.

Prompt 3: Philosophy / Coding Preferences

Source: gpt_engineer/preprompts/philosophy

Technique: Persistent preferences injection — the "remember things between projects" mechanism.

Almost always put different classes in different files.
Always use the programming language the user asks for.
For Python, you always create an appropriate requirements.txt file.
For NodeJS, you always create an appropriate package.json file.
Always add a comment briefly describing the purpose of the function definition.
Add comments explaining very complex bits of logic.
Always follow the best practices for the requested languages for folder/file structure and how to package the project.

Python toolbelt preferences:
- pytest
- dataclasses

Organizations override this file with their own preferences.

Prompting Techniques Used

Staged generation: clarify → plan → generate → verify — explicit phases prevent jumping to code
Minimal clarification gate: "ask at most ONE question or say Nothing to clarify" — prevents paralysis
Completeness mandate: "No placeholders" + "fully functional" enforced in prompt, not just documentation
Entrypoint-first ordering: generation follows import graph, not arbitrary file order
Completion signal: "this concludes a fully working implementation" — a termination signal the runner can detect
Preferences file: Persistent coding philosophy/preferences separate from per-request instructions

Uniqueness

gpt-engineer — Uniqueness and Positioning

Differs from Seeds

gpt-engineer is historically unique as the original "write a whole codebase from a prompt file" tool. It differs from all seed frameworks (which work with existing codebases) in that it generates complete projects from scratch. All seed frameworks assume you have an existing codebase to modify; gpt-engineer starts from a prompt file and writes everything. The preprompts customization pattern (override a directory of plain text files to encode organizational preferences) influenced later frameworks — agent-os's standards/ directory and BMAD-method's persona files are distant descendants. However, gpt-engineer is now ARCHIVED and the maintainers direct users to aider (iterative editing) and Lovable.dev (commercial equivalent).

Historical Significance

First viral LLM coding agent (55,000+ stars as an archived repo): demonstrated that GPT-4 could generate entire applications from prose
Preprompts pattern: the preprompts/ directory as a customization mechanism influenced many later frameworks
Clarify-then-generate: the single-clarifying-question gate is widely copied
"This concludes a fully working implementation": completion signal pattern influenced later prompting techniques
Precursor to Lovable.dev: the commercial evolution of gpt-engineer is now a multi-hundred-million dollar product

Observable Failure Modes (as documented)

ARCHIVED: explicitly unmaintained; do not use for production
Context limit failures: generating large apps exceeded GPT-4's context window
Hallucinated dependencies: generated requirements.txt often included packages that didn't exist or had wrong versions
No iterative editing: every re-run regenerated from scratch; no surgical edits
Single clarification question not enough: complex specs needed multiple rounds of clarification
Generated code quality degrades with app size: works well for simple apps, poorly for complex ones

Workflow

gpt-engineer — Workflow

New Project Workflow

Create project directory with a prompt file
Run gpte <project_dir>
Clarification phase — LLM reads prompt, asks at most ONE clarifying question if ambiguous
Generation phase — LLM generates complete codebase:
- Lists core classes, functions, methods
- Starts with entrypoint file
- Works through imports
- Generates all files
Execution phase — optionally executes the entrypoint to verify
Output — complete codebase in project directory

Improve Workflow

Existing codebase in project directory
Update prompt file with improvement instructions
Run gpte <project_dir> -i
LLM reads existing code + new instructions
Generates diff — only the necessary changes
Output — modified files

Vision Workflow

gpte projects/example-vision gpt-4-vision-preview \
  --prompt_file prompt/text \
  --image_directory prompt/images -i

Adds architecture diagrams or screenshots as visual context.

Benchmark Workflow

bench --config my-agent-config.yaml --dataset apps

Runs agent against APPS or MBPP benchmark datasets.

Phases + Artifacts Table

Phase	Artifact
Clarification	Clarifying question (optional)
Core planning	Class/function/method list (in response)
Code generation	Complete files in project directory
Entrypoint generation	Runnable entrypoint script

Approval Gates

Gate	Type	Notes
Clarification answer	freetext-clarify	User answers the one clarifying question
Execution consent	yes-no	User asked before running generated code

Memory Context

gpt-engineer — Memory and Context

State Storage

State	Storage	Scope
Project prompt	`<project_dir>/prompt`	Project
Generated files	`<project_dir>/`	Project
Conversation history	In-memory during session	Session only
Custom preprompts	`<project_dir>/preprompts/` (with `--use-custom-preprompts`)	Project

Preprompts as Memory

The preprompts/ directory IS the memory mechanism — it's how "the agent remembers things between projects." Organizations maintain a custom preprompts/ directory that encodes:

Preferred languages and frameworks
File structure conventions
Tool preferences (pytest, dataclasses, etc.)
Code style guidelines

This is passed via --use-custom-preprompts, pointing to the custom directory.

No Cross-Session State

gpt-engineer has no database, no vector store, no persistent session history beyond the files in the project directory. Each run is independent except for the shared preprompts.

Context Window

The entire prompt + generated code is in a single LLM conversation. For large codebases, this can exceed context limits. gpt-engineer does not implement context compaction.

Improve Mode Memory

In improve mode (-i), gpt-engineer reads the existing files in the project directory as context. The existing codebase IS the memory.

Orchestration

gpt-engineer — Orchestration

Multi-Agent

No. gpt-engineer is a single LLM call chain (clarify → generate → execute). No subagent spawning.

Orchestration Pattern

Sequential: clarification → planning → generation → execution. Fixed pipeline, no branching.

Isolation

None. Files written directly to the project directory.

Multi-Model

No. Single model per run. Configurable via environment variable or CLI argument.

Execution Mode

One-shot: single run per gpte invocation. No interactive loop.

Crash Recovery

None. If generation fails, re-run from scratch.

Context Compaction

None. Single conversation context throughout generation.

Consensus Mechanism

None.

Prompt Chaining

Yes: the clarification response feeds into the generation prompt. The generation output (class/function list) feeds into actual code generation. These are implicit — not named stages with explicit outputs.

Ui Cli Surface

gpt-engineer — UI and CLI Surface

CLI Binaries

gpte — main code generation command
bench — benchmark runner

No UI

gpt-engineer has no web dashboard, no IDE extension, no TUI. Pure CLI with stdout output.

Key Flags

gpte <project_dir>               # Generate new project
gpte <project_dir> -i            # Improve existing project
gpte <project_dir> --use-custom-preprompts  # Use custom preprompts
gpte <project_dir> --image_directory <dir> <model>  # Vision input

Observability

stdout only
Generated files in project directory

GitHub Codespaces

The repo offers a "Open in GitHub Codespaces" badge — this was the intended browser-based experience for users who didn't want to install locally.

Related frameworks

same archetype · same primary tool · same memory type

Claude-Flow / Ruflo ★ 55k

A6 Multi-agent orchestrator

Eliminates single-agent context limits and sequential bottlenecks by orchestrating fault-tolerant swarms of specialized AI agents…

Hermes Agent (NousResearch) ★ 168k

A6 Multi-agent orchestrator

Self-improving personal AI agent with closed learning loop, 7 terminal backends, and messaging gateway — not tied to any AI…

OpenCode ★ 165k

A6 Multi-agent orchestrator

Terminal-first AI coding agent with multi-model routing, native desktop app, and a typed .opencode/ configuration system for…

OpenHands ★ 75k

A6 Multi-agent orchestrator

Open-source AI software development platform (open-source Devin alternative) with Docker sandbox isolation, 77.6% SWE-bench…

DeerFlow ★ 70k

A6 Multi-agent orchestrator

Long-horizon superagent that researches, codes, and creates by orchestrating parallel sub-agents with isolated contexts in Docker…

oh-my-openagent (omo) ★ 60k

A6 Multi-agent orchestrator

Multi-provider AI agent orchestration for OpenCode: escape vendor lock-in by routing Sisyphus (Claude/Kimi/GLM) and Hephaestus…

Distribution

Type: cli-tool
License: MIT
Install: one-liner
Version: main (archived, last update 2025-05-14)

Surfaces

CLI binary: gpte
CLI subcmds: 0
Local UI: No
Tech stack: none

Components

Commands: 0
Skills: 0
Subagents: 0
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 0
Templates: 8

Workflow

Phases: 4
Approval gates: 2
Spec format: markdown
Spec storage: flat-files
Delta or full: whole-file

Orchestration

Multi-agent: No
Pattern: sequential
Max concurrent: 1
Isolation: none
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: No
BYOK: Yes
Modal: text+vision

Execution

Mode: one-shot
Crash recovery: No
Compaction: No
Session handoff: No
Streaming: Yes

Memory

Type: file-based
Persistence: project
Search: none
State files: 3 files

Quality

TDD: No
TDD mechanism: none
Validators: 1
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: No
Audit format: none
Replay: No

Tools

Primary: openai-api
Targets: 3
Portability: medium

Signals

Stars: 55k
Last commit: 2025-05-14
Contributors: 30
Maintainer: archived
Quality score: 0.9/10