Plano

plano · katanemo/plano · ★ 6.5k · last commit 2026-05-26

Primitive shape 8 total

Skills 8

Summary

Plano — Summary

Plano is an AI-native proxy server and data plane for agentic applications, built on Envoy proxy by its core contributors (Katanemo). It externalizes four concerns from agent code: orchestration (YAML-declared intent routing via a purpose-built 4B-parameter LLM), model agility (unified LLM API with automatic provider fallback), Agentic Signals (zero-code OTEL trace capture), and filter chains (guardrails, moderation, and memory hooks). Agents are plain HTTP servers implementing OpenAI-compatible chat completions — Plano adds the routing, observability, and safety layer without framework dependencies. The planoai CLI (up, down, logs, trace, init) manages a Docker-based deployment of Envoy + brightstaff (native Rust binary).

Compared to seeds: Plano is closest to AgentGateway and ContextForge (both are AI proxies) but distinguishes itself by: (1) using Envoy as the underlying proxy rather than a custom Rust binary, (2) the purpose-built "Plano-Orchestrator" 4B-parameter LLM for intent routing at fraction of GPT-4 cost, and (3) framing agents as plain HTTP services with no framework dependency — the most opinionated "your agent is just an HTTP server" stance in the corpus.

Overview

Plano — Overview

Origin

Developed by Katanemo (katanemo.com), founded by Envoy proxy core contributors. Rust + Python + TypeScript. Apache-2.0. 6,543 stars, 425 forks. Active development.

Philosophy

"Plano pulls out the rote plumbing work and decouples you from brittle framework abstractions, centralizing what shouldn't be bespoke in every codebase - like agent routing and orchestration, rich agentic signals and traces for continuous improvement, guardrail filters for safety and moderation, and smart LLM routing APIs for model agility."

The core thesis: building agentic demos is easy; production is hard because teams end up building the same "hidden middleware" (routing, guardrails, observability, model fallback) from scratch. Plano externalizes this middleware as a reusable data plane.

Key Design Principles

Agents are HTTP servers: Your agent just needs to implement POST /v1/chat/completions. No SDK, no framework lock-in.
YAML-declared intent routing: Describe your agents in natural language; the Plano-Orchestrator 4B model routes requests based on semantic intent.
Envoy as foundation: Built by Envoy core contributors — the same infrastructure battle-tested for petabyte-scale microservices at Lyft, Airbnb, etc.
Purpose-built orchestrator: A 4B-parameter LLM optimized for routing decisions (not general intelligence) at fraction of GPT-4 cost and latency.
Zero instrumentation: OTEL traces captured automatically with no code changes to agents.

Manifesto Quote

"What You Didn't Have to Build" table in README:

Infrastructure Concern Without Plano With Plano

Agent Orchestration Write intent classifier + routing logic Declare agent descriptions in YAML

Model Management Handle each provider's API quirks Unified LLM APIs with state management

Rich Tracing Instrument every service with OTEL Automatic end-to-end traces and logs

Infrastructure Concern	Without Plano	With Plano
Agent Orchestration	Write intent classifier + routing logic	Declare agent descriptions in YAML
Model Management	Handle each provider's API quirks	Unified LLM APIs with state management
Rich Tracing	Instrument every service with OTEL	Automatic end-to-end traces and logs

Architecture

Plano — Architecture

Distribution

Docker image (katanemo/plano:latest) + Python CLI (planoai).

Network Architecture

Client → Envoy (prompt_gateway.wasm → llm_gateway.wasm) → Agents/LLM Providers
                              ↕
                    brightstaff (native binary: state, routing, signals, tracing)

Components

Component	Language	Purpose
`prompt_gateway` (WASM)	Rust → WASM	Proxy-WASM filter for prompt processing, guardrails, filter chains
`llm_gateway` (WASM)	Rust → WASM	Proxy-WASM filter for LLM request/response handling and routing
`brightstaff`	Rust (native)	Core server: handlers, router, signals, state, tracing
`common`	Rust (lib)	Shared: config, HTTP, routing, rate limiting, tokenizer, PII, tracing
`hermesllm`	Rust (lib)	LLM API translation between providers
`planoai` CLI	Python	`up`, `down`, `build`, `logs`, `trace`, `init`, `cli_agent`, `generate_prompt_targets`

Config Format (YAML)

version: v0.3.0
agents:
  - id: weather_agent
    url: http://localhost:10510
  - id: flight_agent
    url: http://localhost:10520
model_providers:
  - model: openai/gpt-4o
    access_key: $OPENAI_API_KEY
    default: true
  - model: anthropic/claude-3-5-sonnet
    access_key: $ANTHROPIC_API_KEY
listeners:
  - type: agent
    name: travel_assistant
    port: 8001
    router: plano_orchestrator_v1
    agents:
      - id: weather_agent
        description: "Gets real-time weather and forecasts..."
      - id: flight_agent
        description: "Searches flights between airports..."
tracing:
  random_sampling: 100

Directory Structure

crates/             # Rust crates (prompt_gateway, llm_gateway, brightstaff, common, hermesllm)
cli/                # Python CLI (planoai/)
apps/               # JS applications
packages/           # JS packages
config/             # plano_config_schema.yaml, envoy.template.yaml, supervisord.conf
demos/              # Example multi-agent demos
skills/             # Claude Code skills for development
.claude/            # Claude Code developer tools
docs/               # Documentation

Process Management

supervisord.conf manages: Envoy (network proxy) + brightstaff (control plane). Both are required for full functionality.

Required Runtime

Docker (for running Plano)
Python 3.x + uv (for CLI)
Rust + wasm32-wasip1 target (for building WASM plugins from source)

Components

Plano — Components

CLI Commands (planoai)

Command	Purpose
`planoai up [config.yaml]`	Start Plano (Docker + Envoy + brightstaff)
`planoai down`	Stop running Plano instance
`planoai build`	Build Docker image from source
`planoai logs`	Stream gateway and agent logs
`planoai trace`	View OpenTelemetry traces
`planoai init`	Initialize a new Plano config
`planoai cli_agent`	Interactive agent testing CLI
`planoai generate_prompt_targets`	Generate prompt target definitions

Core Plano Concepts

Concept	Purpose
Listener	Entry point (port + router configuration)
Agent	Downstream HTTP service implementing chat completions
Model Provider	LLM backend with API key and model ID
Router	Intent routing model (`plano_orchestrator_v1` or custom)
Filter Chain	Ordered pipeline of guardrail/moderation filters
Prompt Target	Named prompt → deterministic function mapping
Agentic Signal	Automatically captured trace data point

Plano-Orchestrator (4B LLM)

A purpose-built, 4B-parameter LLM for intent routing. From YAML agent descriptions, it classifies incoming requests and routes to the appropriate downstream agent. "Production-grade routing at a fraction of the cost and latency" compared to using GPT-4 for routing.

The orchestrator is hosted free in US-central for development; production requires self-hosted or paid API.

Filter Chains

Applied to all requests/responses through prompt_gateway.wasm:

Jailbreak protection
Content moderation policies
Memory hooks (inject/extract memory from conversation)
PII detection and redaction (via common crate)

hermesllm

LLM API translation layer that normalizes differences between providers:

ProviderId — provider identifier
ProviderRequest / ProviderResponse — unified request/response types
ProviderStreamResponse — unified streaming type

Claude Code Developer Skills

.claude/skills/:

build-brightstaff — build the native Rust binary
build-cli — build the Python CLI
build-wasm — build WASM plugins
check — run pre-commit checks
new-provider — scaffold a new LLM provider adapter
pr — create a pull request
release — create a release
test-python — run Python tests

Prompts

Plano — Prompts

Plano is an infrastructure proxy. Its "prompts" are YAML configuration and the CLAUDE.md developer instructions.

YAML Agent Description as Routing Prompt

The agent description field in config.yaml is effectively a routing prompt consumed by the Plano-Orchestrator LLM:

agents:
  - id: weather_agent
    description: |
      Gets real-time weather and forecasts for any city worldwide.
      Handles: "What's the weather in Paris?", "Will it rain in Tokyo?"

  - id: flight_agent
    description: |
      Searches flights between airports with live status and schedules.
      Handles: "Flights from NYC to LA", "Show me flights to Seattle"

Prompting technique: Few-shot intent examples embedded in the description. The Plano-Orchestrator uses these examples to classify incoming user messages and route to the correct agent. This is a novel "description as routing prompt" pattern where natural language descriptions drive deterministic routing via a specialized LLM.

CLAUDE.md (Developer Instructions)

# CLAUDE.md

Plano is an AI-native proxy server and data plane for agentic applications, built on Envoy proxy.
It centralizes agent orchestration, LLM routing, observability, and safety guardrails as an
out-of-process dataplane.

## Architecture

Client → Envoy (prompt_gateway.wasm → llm_gateway.wasm) → Agents/LLM Providers
                              ↕
                         brightstaff (native binary: state, routing, signals, tracing)

## Build & Test Commands

# Rust — WASM plugins (must target wasm32-wasip1)
cd crates && cargo build --release --target=wasm32-wasip1 -p llm_gateway -p prompt_gateway

# Rust — brightstaff binary (native target)
cd crates && cargo build --release -p brightstaff

Prompting technique: Architecture diagram as ASCII art embedded in developer instructions. The pipeline diagram (Client → Envoy → ...) gives AI coding assistants a mental model of data flow without requiring prose explanation. Critically specifies the WASM build target (wasm32-wasip1) — a concrete constraint that prevents AI assistants from using the wrong target.

Prompt Targets

planoai generate_prompt_targets generates definitions that map prompts to deterministic API calls:

# Example prompt target
- name: "get_weather"
  prompt_template: "Get weather for {city} on {date}"
  target_url: "http://weather-api/forecast"
  parameter_extraction: [city, date]

Prompting technique: Template-based prompt-to-function mapping. Deterministic (not LLM-routed) for specific known intents.

Uniqueness

Plano — Uniqueness & Positioning

Differs from Seeds

Plano is categorically different from all 11 seeds — it is an AI-native data plane, not a coding methodology. The closest seeds are the MCP-anchored archetype (taskmaster-ai, claude-flow) but those are coding workflow tools; Plano routes production AI agent traffic. Within this batch, Plano competes directly with ContextForge (IBM), AgentGateway (LF), and Archestra — all MCP/AI gateways. Plano's differentiators: (1) built on Envoy rather than custom Rust, (2) the purpose-built Plano-Orchestrator 4B LLM for routing, (3) agents-as-HTTP-services philosophy (no SDK required), and (4) YAML-declared routing via natural language descriptions.

Unique Characteristics

Envoy foundation: Only framework in corpus built directly on Envoy proxy by its core contributors. Envoy is the reference implementation for production service mesh proxy — this brings proven production infrastructure to AI routing.
Plano-Orchestrator (4B LLM): Purpose-built routing model. No other framework in corpus ships its own routing LLM. The 4B parameter size optimizes for routing decisions at low cost — not general AI capability.
WASM filter plugins: prompt_gateway.wasm and llm_gateway.wasm are compiled to WebAssembly for Envoy — sandboxed, portable, zero-copy filter execution. Only WASM-based AI proxy in corpus.
Agent = HTTP server, zero SDK: The "your agent is just a POST endpoint" philosophy is more opinionated than any other framework in corpus. No SDK installation, no framework coupling.
description = routing intent: YAML agent descriptions with embedded examples serve as the routing prompt for the Plano-Orchestrator. Novel pattern: natural language descriptions are executable routing configuration.
Agentic Signals trademark: The "Agentic Signals™" brand (zero-code structured trace capture) differentiates Plano's observability story from generic OTel instrumentation.

Positioning

"The Envoy for AI agents." For teams already running Envoy-based service meshes who want to add AI agents without bolting on a separate infrastructure layer. Also positioned for teams building multi-agent applications who want natural language routing without writing classifier code.

Observable Failure Modes

Plano-Orchestrator availability: Free hosted in US-central for dev; production requires self-hosting or paid tier. Not fully open-source (the 4B model weights are not freely distributed).
Envoy complexity: Building custom WASM plugins requires Rust + wasm32-wasip1 — steep learning curve for organizations without Rust expertise.
Stateless agents assumption: Plano assumes agents are stateless HTTP services. Stateful agents require additional state management infrastructure.
supervisord dependency: The supervisor process management adds operational complexity vs. pure Docker or Kubernetes deployments.

Cross-References

Competes with: ContextForge, AgentGateway, Archestra — all AI proxy/gateway frameworks in this batch
Built on: Envoy proxy (same team)
Industry research: "industry-leading LLM research" referenced in README (not cited)

Workflow

Plano — Workflow

Agent Developer Workflow

Phase	Artifact
Write agent	Plain HTTP server (FastAPI, Flask, Express, etc.) implementing `POST /v1/chat/completions`
Define config	`config.yaml` declaring agents, model providers, listeners, router
Start Plano	`planoai up config.yaml`
Query	Send requests to `http://localhost:{port}/v1/chat/completions`
Observe	`planoai trace` — view automatic OTEL traces
Add agents	Edit `config.yaml`, restart — no routing code changes

Multi-Agent Request Flow

User → Listener (port 8001)
     → Plano-Orchestrator evaluates intent vs. agent descriptions
     → Routes to weather_agent OR flight_agent (or both sequentially)
     → Passes through filter chain (guardrails, moderation, memory)
     → Forwards to LLM Gateway (model selection, provider normalization)
     → Response returned with traces

LLM Gateway Flow

Agent requests LLM → lightstaff/hermesllm normalizes provider API
                   → Selects model (default or specified)
                   → Handles API quirks per provider
                   → Returns normalized response
                   → Captures Agentic Signal (token usage, latency, etc.)

Filter Chain Execution

Request enters Envoy
prompt_gateway.wasm executes filter chain (pre-request):
- Jailbreak filter
- Content moderation
- Memory injection
- PII check
Request forwarded to appropriate agent
Response passes through post-response filter chain
Agentic Signal captured automatically

Config Changes / Adding Agents

Adding a new agent requires only YAML config changes — no code changes to existing agents, no routing logic updates, no framework migrations.

Approval Gates

None — Plano is an infrastructure proxy with automatic policy enforcement via filter chains. No interactive approval gates.

Memory Context

Plano — Memory & Context

State Storage

brightstaff native binary manages runtime state:

Agent routing state (active connections, session context)
LLM provider state (active models, request queues)
Signal capture state (Agentic Signal accumulators)

No external database required for basic deployments.

Memory Hooks (Filter Chain)

Plano's filter chains support memory injection/extraction:

Pre-request memory injection: Fetch relevant memories from a memory store and inject into the request context before forwarding to the agent
Post-response memory extraction: Extract new information from agent responses and store in memory

This is a hook-based memory system — the memory store implementation is plugged in via filter chain configuration.

Agentic Signals

Zero-code capture of structured data points from every interaction:

Token usage per agent per turn
Latency at each stage (routing, LLM, agent)
Model selection decisions
Routing decisions (which agent handled what)
Error rates

Exported as OTEL traces — observable in Jaeger, Zipkin, or any OTEL backend.

Trace State

planoai trace command connects to the trace listener started by brightstaff (or start_trace_listener_background). Traces are streamed for real-time observation.

Cross-Session State

Sessions are not persistent by default. The filter chain memory hooks can inject external memory for session continuity.

Config State

config.yaml is the authoritative state for the system topology. Changes require planoai down && planoai up.

PII State

The common crate handles PII detection and redaction inline during request processing. No separate PII audit log mentioned.

Orchestration

Plano — Orchestration

Multi-Agent Support

Yes — multi-agent routing is Plano's core value proposition. The Plano-Orchestrator routes requests to multiple downstream agents based on intent.

Orchestration Pattern

Hierarchical — the Plano-Orchestrator (4B LLM) acts as the orchestrator, routing to registered agents. Each agent handles its own sub-task and returns results. Can also support sequential chaining where the orchestrator routes through multiple agents in sequence.

Execution Mode

Background daemon — planoai up starts Envoy + brightstaff as background processes managed by supervisord.

Multi-Model Routing

Yes — primary value proposition of the LLM Gateway:

Unified OpenAI-compatible API abstracting provider differences
Default model per config (e.g., openai/gpt-4o)
Fallback to alternative providers on failure
Model selection by name or alias
Future: automatic routing by task complexity (like Plano's 4B orchestrator for routing decisions)

Plano-Orchestrator Details

Size: 4B parameters
Specialization: Intent classification for routing decisions (not general intelligence)
Cost: "Fraction of GPT-4" for routing (documented claim)
Hosted: Free in US-central for development; self-hosted or paid for production
v1: plano_orchestrator_v1 — the current routing model

Isolation Mechanism

None — agents are separate HTTP services (by developer design). Physical isolation is the developer's responsibility. Plano routes between already-isolated services.

Filter Chain as Sequential Processing

The prompt_gateway.wasm filter chain is a sequential pipeline applied to every request/response. This is not agent orchestration but request transformation.

Consensus Mechanism

None. Single orchestrator makes routing decisions. No distributed consensus.

Ui Cli Surface

Plano — UI/CLI Surface

CLI Binary

Name: planoai Type: Standalone Python CLI (not a thin wrapper — starts/manages Docker services) Install: pip install planoai (inside cli/) Framework: rich-click (click + rich for styled output)

CLI Commands

Command	Purpose
`planoai up [config.yaml] [--docker]`	Start Plano (Envoy + brightstaff via supervisord or Docker)
`planoai down [--docker] [--verbose]`	Stop Plano
`planoai build [--docker]`	Build Docker image
`planoai logs [--debug] [--follow] [--docker]`	Stream gateway logs
`planoai trace [--default-otel-endpoint]`	View OTEL traces (starts trace listener if needed)
`planoai init`	Initialize new config
`planoai cli_agent`	Interactive CLI for testing agents
`planoai generate_prompt_targets [--file] [--path] [--settings]`	Generate prompt target definitions

Local Web Dashboard

Exists: No dedicated web dashboard. Observability through planoai trace (CLI trace viewer) or external OTEL backends.

A ChatGPT-like interface may exist in apps/ but is not documented as a core feature.

Observability

Agentic Signals: Automatic OTEL spans for every request (no code changes)
planoai trace: CLI-based trace inspection
OTEL export: Compatible with Jaeger, Zipkin, Grafana Tempo
Default OTEL gRPC endpoint: configurable via --default-otel-grpc-endpoint

Demo

docs/source/_static/img/demo_tracing.png — screenshot of automatic trace output

Development UI (Claude Code skills)

.claude/skills/ provides 8 developer skills for the Plano repository itself:

build-brightstaff, build-cli, build-wasm, check, new-provider, pr, release, test-python

These are development workflow accelerators, not end-user features.

Integration Points

Agents connect as HTTP servers implementing POST /v1/chat/completions
LLM providers accessed via OpenAI-compatible API
OTEL backends for trace export
No dedicated MCP integration (standard HTTP services)

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

A8 Cross-runtime harness

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A8 Cross-runtime harness

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

A8 Cross-runtime harness

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

A8 Cross-runtime harness

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

A8 Cross-runtime harness

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

A8 Cross-runtime harness

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.

Distribution

Type: docker-image
License: Apache-2.0
Install: multi-step
Version: v0.3.0 (config schema version, commit 2026-05-26)

Surfaces

CLI binary: planoai
CLI subcmds: 8
Local UI: No
Tech stack: null

Components

Commands: 0
Skills: 8
Subagents: 0
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 3
Templates: 2

Workflow

Phases: 5
Approval gates: 0
Spec format: yaml
Spec storage: flat-files
Delta or full: whole-file

Orchestration

Multi-agent: Yes
Pattern: hierarchical
Isolation: none
Consensus: none
Prompt chaining: No

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text

Execution

Mode: background-daemon
Compaction: No
Session handoff: No
Streaming: Yes

Memory

Type: none
Persistence: none
Search: none
State files: 2 files

Quality

TDD: Optional
TDD mechanism: none
Validators: 3
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: Yes
Audit format: proprietary
Replay: No

Tools

Primary: any-http-agent
Targets: 2
Portability: high

Signals

Stars: 6.5k
Last commit: 2026-05-26
Contributors: 30
Maintainer: active
Quality score: 3.7/10