agentflow (berabuddies)

agentflow-berabuddies · berabuddies/agentflow · ★ 1.3k · last commit 2026-05-23

Python DSL for building multi-agent DAG pipelines with >> operator, Jinja2 prompt chaining, fanout/merge parallelism, iterative LLM-as-judge cycles, and native EC2/ECS/SSH remote execution.

Best whenAgent orchestration should be expressed as code (DAG with >> operator), not config files or markdown — enabling version control, testing, and programmatic co…

Skip ifConfiguring pipelines in YAML/JSON rather than code, Single-model pipelines (use the right model for each node)

vs seeds

taskmaster-ai(explicit task decomposition with dependencies) but agentflow is programmatic, multi-model, and remote-execution-capable…

Primitive shape 5 total

Skills 1 Subagents 4

Summary

agentflow (berabuddies) — Summary

agentflow is a Python DSL and CLI tool for building and running multi-agent pipelines as explicit directed acyclic graphs (DAGs). It defines nodes as codex(), claude(), kimi(), or pi() agents connected via the >> operator, with fanout() for parallel expansion and merge() for batched aggregation. The >> syntax, combined with Jinja2 template interpolation of prior node outputs ({{ nodes.plan.output }}), makes prompt chaining explicit: one node's output literally becomes the next node's prompt input. It supports Codex, Claude, Kimi, and Pi (a router to 10+ providers including local Ollama/LMStudio) as execution targets, making it the most cross-provider orchestration DSL in the batch. Remote execution is natively supported: nodes can declare target={"kind": "ec2"} or target={"kind": "ssh"} to run on remote machines without extra setup. agentflow also auto-installs a skill for Codex and Claude Code so pipeline creation can itself be delegated to an agent. Compared to seeds: closest to taskmaster-ai in explicit DAG task decomposition, but agentflow is a general-purpose Python pipeline DSL (not task management), uses programmatic graph construction rather than AI-generated task trees, and uniquely supports cross-model cross-machine execution.

Overview

agentflow (berabuddies) — Overview

Origin

Created by shouc (GitHub: berabuddies/agentflow, also at shouc/agentflow per install script). Python package. 1,255 stars. License: MIT.

Philosophy

From the README description:

"Orchestrate thousands of agents and harnesses as a graph programmatically"

The framework is built around explicit graph construction: you write a Python script defining a DAG, then run it. The mental model is a data pipeline (nodes = agents, edges = data flow) rather than a team hierarchy.

Key design decisions:

Pipeline-as-code — graphs are Python files, version-controllable, reusable
Multi-model explicit — codex(), claude(), kimi(), pi() are distinct node types; mixing models in a pipeline is a first-class use case
Jinja2 prompt chaining — {{ nodes.plan.output }} interpolates previous outputs into prompts, making the data flow explicit
Remote execution built-in — EC2, ECS Fargate, SSH targets without extra infrastructure
Iterative cycles — review.on_failure >> write creates cycles for LLM-as-judge loops
Auto-installs its skill — curl ... | bash installs both the CLI and a Codex/Claude skill so the agent itself can build agentflow pipelines

Usage from README

from agentflow import Graph, codex, claude

with Graph("my-pipeline", concurrency=3) as g:
    plan = codex(task_id="plan", prompt="Inspect the repo and plan the work.", tools="read_only")
    impl = claude(task_id="impl", prompt="Implement the plan:\n{{ nodes.plan.output }}", tools="read_write")
    review = codex(task_id="review", prompt="Review:\n{{ nodes.impl.output }}")
    plan >> impl >> review

This three-node pipeline: Codex plans → Claude implements (using Codex's plan as prompt) → Codex reviews.

Architecture

agentflow (berabuddies) — Architecture

Distribution

Python package: agentflow (pip installable)
Install: curl -fsSL https://raw.githubusercontent.com/shouc/agentflow/master/install.sh | bash (also installs Codex and Claude Code skills)
Or manual: python3 -m venv .venv && pip install -e .[dev]
CLI binary: agentflow
Required runtime: Python 3.x

Directory structure (package)

agentflow/
  __init__.py       # Public API exports (Graph, codex, claude, pi, fanout, merge)
  dsl.py            # Python DSL (Graph context manager, NodeBuilder, >> operator)
  agents/
    base.py         # AgentAdapter base class
    codex.py        # Codex CLI adapter
    claude.py       # Claude Code adapter
    kimi.py         # Kimi CLI adapter
    pi.py           # Pi routing adapter (10+ providers)
    registry.py     # Agent type registry
  orchestrator.py   # Pipeline execution engine
  cli.py            # agentflow CLI entry point
  graph_optimizer.py # DAG optimization
  loader.py         # Pipeline file loader
  runners/          # Execution backends
  cloud/            # EC2, ECS, SSH remote execution
  skills.py         # Auto-install skill for Codex/Claude

skills/             # Codex/Claude Code skill files (auto-installed)

Config

No global config file. Per-graph configuration via Graph() constructor arguments:

Graph("name", concurrency=4, max_iterations=10, use_worktree=False, fail_fast=False)

Target AI tools

Codex, Claude Code, Kimi, Pi (router to Anthropic/OpenAI/Groq/Cerebras/xAI/DeepSeek/Gemini/ OpenRouter/Bedrock/Ollama/LMStudio). Cross-tool portability: high.

Components

agentflow (berabuddies) — Components

Python DSL primitives

Primitive	Type	Purpose
`Graph(name, ...)`	Context manager	Define a pipeline DAG
`codex(task_id, prompt, ...)`	NodeBuilder	Codex CLI agent node
`claude(task_id, prompt, ...)`	NodeBuilder	Claude Code agent node
`kimi(task_id, prompt, ...)`	NodeBuilder	Kimi CLI agent node
`pi(task_id, prompt, model=..., ...)`	NodeBuilder	Pi router node (multi-provider)
`>>` operator	Dependency edge	`plan >> impl` means impl depends on plan
`fanout(node, source)`	Expansion	Create N parallel copies of node
`merge(node, source, size=N)`	Aggregation	Batch merge N fanout results
`node.on_failure >> target`	Back-edge	Retry cycle: on failure, go back to target

Fanout types

fanout(node, int) — N identical copies
fanout(node, list) — one copy per item in list
fanout(node, dict) — Cartesian product over axes

CLI commands

agentflow run pipeline.py             # Run a pipeline file
agentflow run pipeline.py --output summary  # Run + show summary output

Agent adapters (4)

CodexAdapter — generates TOML config, calls codex app-server
ClaudeAdapter — wraps Claude Code CLI
KimiAdapter — wraps Kimi CLI
PiAdapter — routes to any provider via Pi's OpenAI-compatible/Anthropic-compatible wire API

Skill (auto-installed)

The install script deploys a agentflow skill to Codex and Claude Code that allows natural language pipeline creation:

codex "Use agentflow to fan out 10 codex agents, each telling a unique joke..."

Remote execution targets

ec2 — auto-discovers AMI, key pair, VPC; zero-config
ecs — ECS Fargate; auto-discovers VPC, builds agent image
ssh — arbitrary SSH host
shared — shared instance across multiple pipeline nodes

Prompts

agentflow (berabuddies) — Prompt Excerpts

Excerpt 1: Iterative cycle pipeline (README example)

Source: README.md

with Graph("iterative-impl", max_iterations=5) as g:
    write = codex(
        task_id="write",
        prompt="Write a Python email validator.\n{% if nodes.review.output %}Fix: {{ nodes.review.output }}{% endif %}",
        tools="read_write",
    )
    review = claude(
        task_id="review",
        prompt="Review:\n{{ nodes.write.output }}\nIf complete, say LGTM. Otherwise list issues.",
        success_criteria=[{"kind": "output_contains", "value": "LGTM"}],
    )
    write >> review
    review.on_failure >> write  # loop until LGTM or max_iterations

Prompting technique: LLM-as-judge pattern with Jinja2 conditional repair injection. The write node's prompt includes {% if nodes.review.output %}Fix: ...{% endif %} so the first run writes from scratch, but subsequent iterations inject the reviewer's criticism as repair context. The success_criteria gate and on_failure back-edge create an automated loop without human intervention.

Excerpt 2: Cross-model pipeline (README example)

with Graph("mixed") as g:
    # External: Claude via Pi
    review = pi(
        task_id="review",
        prompt="Review {{ nodes.impl.output }}",
        model="anthropic/claude-sonnet-4-6:high",
    )

    # Local: LMStudio (add provider once in ~/.pi/agent/models.json)
    scan = pi(
        task_id="scan",
        prompt="Scan the repo for TODOs.",
        model="lmstudio/qwen/qwen3.6-27b",
        tools="read_only",
    )

Prompting technique: Provider-routing via model string format provider/model. The Pi adapter parses anthropic/claude-sonnet-4-6:high and lmstudio/qwen/... to route to different endpoints. The same Jinja2 template system ({{ nodes.impl.output }}) works across providers, making cross-model chaining transparent to the prompt author.

Uniqueness

agentflow (berabuddies) — Uniqueness & Positioning

Differs from seeds

No direct match among the 11 seeds. agentflow is the only programmatic DAG DSL in the corpus. All seeds use markdown files, YAML configs, or natural language prompts to define workflows. agentflow uses Python code with operator overloading (>>).

Key differentiators:

Python DSL with >> operator — graph construction is code, not config. Versioned, testable, composable via Python modules. No other framework takes this approach.
Fanout/merge semantics — fanout(node, list), fanout(node, int), fanout(node, dict) (Cartesian product) provide structured parallel expansion. The {% for r in fanouts.review.nodes %} Jinja2 pattern for aggregating results is unique in the corpus.
Cross-model + cross-machine in same pipeline — mixing codex(), claude(), pi() nodes targeting different providers AND different machines (EC2/ECS/SSH) in a single with Graph(): block. No other framework treats multi-model + remote execution as equally first-class.
Iterative cycle with on_failure >> — back-edges (review.on_failure >> write) create structured LLM-as-judge loops with configurable max_iterations. The success_criteria field provides automated exit conditions without human intervention.
Self-bootstrapping skill — the install script deploys a skill so the agent can build and run agentflow pipelines, making the framework meta-capable (agents building agent pipelines).

Observable failure modes

No persistent memory — pipelines are stateless; learning across runs requires external storage
No human approval gates — fully automated by default
The Python DSL requires programming knowledge; less accessible to non-developers
Remote execution requires AWS credentials or SSH access
No web dashboard for pipeline monitoring

Competitive positioning

agentflow targets developers who want to express multi-agent workflows as code, mix models in a single pipeline, and run agents on remote infrastructure. It fills the CI/CD-pipeline niche for AI agent orchestration — closer to Airflow or Prefect for AI agents than to Claude Code plugins.

Workflow

agentflow (berabuddies) — Workflow

Pipeline definition workflow

# 1. Define graph
with Graph("name", concurrency=N) as g:
    node_a = codex(task_id="a", prompt="...")
    node_b = claude(task_id="b", prompt="{{ nodes.a.output }}")
    node_a >> node_b

# 2. Run
agentflow run pipeline.py

Standard linear pipeline

Sequential nodes: plan >> impl >> review
Each node receives prior node's output via {{ nodes.<id>.output }}

Parallel fanout pipeline

scan → fanout(review, [file1, file2, file3]) → merge(summary)
All review copies run in parallel (concurrency bound by Graph.concurrency)

Iterative cycle (LLM-as-judge)

write → review (success_criteria: output_contains "LGTM")
review.on_failure >> write   # loop until LGTM or max_iterations

Phase artifacts

Phase	Artifact
Graph definition	Python `.py` pipeline file
Node execution	stdout/stderr output captured per node
Fanout	N parallel node outputs
Pipeline completion	aggregated summary output

Approval gates

None by default. The success_criteria field on nodes provides automated pass/fail gating (e.g., output_contains "LGTM" for iterative loops).

Worktree support

Graph(use_worktree=True) — each node operates in a git worktree for isolation.

Memory Context

agentflow (berabuddies) — Memory & Context

Node output propagation

State flows via Jinja2 template interpolation between nodes. Each node's output is accessible as {{ nodes.<task_id>.output }} in subsequent nodes' prompts.

For fanout nodes, outputs are accessible as:

{% for r in fanouts.review.nodes %}{{ r.output }}\n{% endfor %}

Scratchboard

Graph(scratchboard=True) — enables a shared scratchboard file that agents can write to and read from during execution. Provides a lightweight shared state mechanism.

Persistence

session only — node outputs live in the running process during pipeline execution. No cross-session persistence is documented.

Memory (per node)

Each agent (Codex, Claude, Pi) has its own context window within its execution. agentflow does not manage or compress agent contexts — that's delegated to the underlying CLI tool.

Worktree isolation

When use_worktree=True, each node operates in a separate git worktree, providing filesystem isolation between parallel nodes.

Remote execution state

For remote targets (EC2/ECS/SSH), the graph executor ships the node's prompt and config to the remote host, executes the agent there, and captures stdout/stderr back to the local orchestrator.

No persistent memory

agentflow is a pipeline runner — not a memory system. There is no vector DB, SQLite, or cross-pipeline learning. Each pipeline run is independent.

Orchestration

agentflow (berabuddies) — Orchestration

Pattern

task-decomposition-tree / parallel-fan-out — graphs can be linear (sequential), parallel (fanout), or iterative (back-edges). The >> DAG operator makes the pattern explicit and user-defined rather than framework-imposed.

Subagent definition format

code-class — NodeBuilder objects. Each codex(task_id=..., prompt=...) call creates a NodeBuilder instance that is registered in the Graph context manager.

The underlying agent adapters (CodexAdapter, ClaudeAdapter, etc.) are Python classes implementing AgentAdapter with a run() method.

Isolation mechanism

git-worktree (optional) — Graph(use_worktree=True) creates a worktree per node. process — each agent node runs as a separate subprocess. Remote nodes run in separate processes on different machines.

Multi-model

Yes — first-class. codex(), claude(), kimi(), pi() are distinct node types. A single pipeline can mix them freely:

plan = codex(...)  # Codex for planning
impl = claude(...) # Claude for implementation
review = pi(..., model="anthropic/claude-sonnet-4-6:high")  # Pi routing to Claude
local = pi(..., model="lmstudio/qwen/qwen3.6-27b")  # Local Ollama via Pi

Execution mode

one-shot per agentflow run invocation. The pipeline runs to completion and exits. Iterative cycles (via on_failure >>) run within a single invocation up to max_iterations.

Concurrency

Controlled by Graph(concurrency=N) — N parallel nodes execute simultaneously. Fanout nodes may spawn up to concurrency parallel copies.

Crash recovery

fail_fast=True/False per Graph. No automatic retry for non-cycle pipelines (cycles use on_failure >> explicitly).

Ui Cli Surface

agentflow (berabuddies) — UI & CLI Surface

CLI binary

Binary name: agentflow Entry: agentflow/__main__.py via cli.py

Commands:

agentflow run pipeline.py — execute a pipeline file
agentflow run pipeline.py --output summary — execute and print summary

No web dashboard

No local UI. Terminal output only.

Skill auto-install

The install script deploys a agentflow skill to both Codex and Claude Code:

curl -fsSL https://raw.githubusercontent.com/shouc/agentflow/master/install.sh | bash

This enables natural language pipeline creation:

codex "Use agentflow to fan out 10 codex agents, each telling a unique joke,
       then merge their outputs and pick the funniest one."

The skill allows an AI agent to write and run agentflow pipelines on behalf of the user.

To JSON

print(g.to_json())  # Print pipeline DAG as JSON (for inspection/debugging)

Remote execution observability

For EC2/ECS/SSH targets, the orchestrator streams stdout/stderr from remote agents back to the local terminal in real time.

Doctor command

agentflow doctor (inferred from agentflow/doctor.py) — health check for installation and agent tool availability.

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

A8 Cross-runtime harness

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A8 Cross-runtime harness

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

A8 Cross-runtime harness

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

A8 Cross-runtime harness

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

A8 Cross-runtime harness

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

A8 Cross-runtime harness

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.

Distribution

Type: bash-script-bundle
License: MIT
Install: one-liner
Version: unknown (master branch)

Surfaces

CLI binary: agentflow
CLI subcmds: 2
Local UI: No
Tech stack: null

Components

Commands: 0
Skills: 1
Subagents: 4
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 1
Templates: 0

Workflow

Phases: 5
Approval gates: 0
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: Yes
Pattern: parallel-fan-out
Isolation: process
Consensus: none
Prompt chaining: Yes

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text

Execution

Mode: one-shot
Crash recovery: No
Compaction: No
Session handoff: No
Streaming: Yes

Memory

Type: none
Persistence: session
Search: none

Quality

TDD: No
TDD mechanism: none
Self-review: inline-self

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: Yes
Audit log: No
Audit format: none
Replay: No

Tools

Primary: codex
Targets: 5
Portability: high

Signals

Stars: 1.3k
Last commit: 2026-05-23
Contributors: 3
Maintainer: active
Quality score: 2.5/10