Skip to content
/

Plano

plano · katanemo/plano · ★ 6.5k · last commit 2026-05-26

Primitive shape 8 total
Skills 8
00

Summary

Plano — Summary

Plano is an AI-native proxy server and data plane for agentic applications, built on Envoy proxy by its core contributors (Katanemo). It externalizes four concerns from agent code: orchestration (YAML-declared intent routing via a purpose-built 4B-parameter LLM), model agility (unified LLM API with automatic provider fallback), Agentic Signals (zero-code OTEL trace capture), and filter chains (guardrails, moderation, and memory hooks). Agents are plain HTTP servers implementing OpenAI-compatible chat completions — Plano adds the routing, observability, and safety layer without framework dependencies. The planoai CLI (up, down, logs, trace, init) manages a Docker-based deployment of Envoy + brightstaff (native Rust binary).

Compared to seeds: Plano is closest to AgentGateway and ContextForge (both are AI proxies) but distinguishes itself by: (1) using Envoy as the underlying proxy rather than a custom Rust binary, (2) the purpose-built "Plano-Orchestrator" 4B-parameter LLM for intent routing at fraction of GPT-4 cost, and (3) framing agents as plain HTTP services with no framework dependency — the most opinionated "your agent is just an HTTP server" stance in the corpus.

01

Overview

Plano — Overview

Origin

Developed by Katanemo (katanemo.com), founded by Envoy proxy core contributors. Rust + Python + TypeScript. Apache-2.0. 6,543 stars, 425 forks. Active development.

Philosophy

"Plano pulls out the rote plumbing work and decouples you from brittle framework abstractions, centralizing what shouldn't be bespoke in every codebase - like agent routing and orchestration, rich agentic signals and traces for continuous improvement, guardrail filters for safety and moderation, and smart LLM routing APIs for model agility."

The core thesis: building agentic demos is easy; production is hard because teams end up building the same "hidden middleware" (routing, guardrails, observability, model fallback) from scratch. Plano externalizes this middleware as a reusable data plane.

Key Design Principles

  1. Agents are HTTP servers: Your agent just needs to implement POST /v1/chat/completions. No SDK, no framework lock-in.
  2. YAML-declared intent routing: Describe your agents in natural language; the Plano-Orchestrator 4B model routes requests based on semantic intent.
  3. Envoy as foundation: Built by Envoy core contributors — the same infrastructure battle-tested for petabyte-scale microservices at Lyft, Airbnb, etc.
  4. Purpose-built orchestrator: A 4B-parameter LLM optimized for routing decisions (not general intelligence) at fraction of GPT-4 cost and latency.
  5. Zero instrumentation: OTEL traces captured automatically with no code changes to agents.

Manifesto Quote

"What You Didn't Have to Build" table in README:

Infrastructure Concern Without Plano With Plano
Agent Orchestration Write intent classifier + routing logic Declare agent descriptions in YAML
Model Management Handle each provider's API quirks Unified LLM APIs with state management
Rich Tracing Instrument every service with OTEL Automatic end-to-end traces and logs
02

Architecture

Plano — Architecture

Distribution

Docker image (katanemo/plano:latest) + Python CLI (planoai).

Network Architecture

Client → Envoy (prompt_gateway.wasm → llm_gateway.wasm) → Agents/LLM Providers
                              ↕
                    brightstaff (native binary: state, routing, signals, tracing)

Components

Component Language Purpose
prompt_gateway (WASM) Rust → WASM Proxy-WASM filter for prompt processing, guardrails, filter chains
llm_gateway (WASM) Rust → WASM Proxy-WASM filter for LLM request/response handling and routing
brightstaff Rust (native) Core server: handlers, router, signals, state, tracing
common Rust (lib) Shared: config, HTTP, routing, rate limiting, tokenizer, PII, tracing
hermesllm Rust (lib) LLM API translation between providers
planoai CLI Python up, down, build, logs, trace, init, cli_agent, generate_prompt_targets

Config Format (YAML)

version: v0.3.0
agents:
  - id: weather_agent
    url: http://localhost:10510
  - id: flight_agent
    url: http://localhost:10520
model_providers:
  - model: openai/gpt-4o
    access_key: $OPENAI_API_KEY
    default: true
  - model: anthropic/claude-3-5-sonnet
    access_key: $ANTHROPIC_API_KEY
listeners:
  - type: agent
    name: travel_assistant
    port: 8001
    router: plano_orchestrator_v1
    agents:
      - id: weather_agent
        description: "Gets real-time weather and forecasts..."
      - id: flight_agent
        description: "Searches flights between airports..."
tracing:
  random_sampling: 100

Directory Structure

crates/             # Rust crates (prompt_gateway, llm_gateway, brightstaff, common, hermesllm)
cli/                # Python CLI (planoai/)
apps/               # JS applications
packages/           # JS packages
config/             # plano_config_schema.yaml, envoy.template.yaml, supervisord.conf
demos/              # Example multi-agent demos
skills/             # Claude Code skills for development
.claude/            # Claude Code developer tools
docs/               # Documentation

Process Management

supervisord.conf manages: Envoy (network proxy) + brightstaff (control plane). Both are required for full functionality.

Required Runtime

  • Docker (for running Plano)
  • Python 3.x + uv (for CLI)
  • Rust + wasm32-wasip1 target (for building WASM plugins from source)
03

Components

Plano — Components

CLI Commands (planoai)

Command Purpose
planoai up [config.yaml] Start Plano (Docker + Envoy + brightstaff)
planoai down Stop running Plano instance
planoai build Build Docker image from source
planoai logs Stream gateway and agent logs
planoai trace View OpenTelemetry traces
planoai init Initialize a new Plano config
planoai cli_agent Interactive agent testing CLI
planoai generate_prompt_targets Generate prompt target definitions

Core Plano Concepts

Concept Purpose
Listener Entry point (port + router configuration)
Agent Downstream HTTP service implementing chat completions
Model Provider LLM backend with API key and model ID
Router Intent routing model (plano_orchestrator_v1 or custom)
Filter Chain Ordered pipeline of guardrail/moderation filters
Prompt Target Named prompt → deterministic function mapping
Agentic Signal Automatically captured trace data point

Plano-Orchestrator (4B LLM)

A purpose-built, 4B-parameter LLM for intent routing. From YAML agent descriptions, it classifies incoming requests and routes to the appropriate downstream agent. "Production-grade routing at a fraction of the cost and latency" compared to using GPT-4 for routing.

The orchestrator is hosted free in US-central for development; production requires self-hosted or paid API.

Filter Chains

Applied to all requests/responses through prompt_gateway.wasm:

  • Jailbreak protection
  • Content moderation policies
  • Memory hooks (inject/extract memory from conversation)
  • PII detection and redaction (via common crate)

hermesllm

LLM API translation layer that normalizes differences between providers:

  • ProviderId — provider identifier
  • ProviderRequest / ProviderResponse — unified request/response types
  • ProviderStreamResponse — unified streaming type

Claude Code Developer Skills

.claude/skills/:

  • build-brightstaff — build the native Rust binary
  • build-cli — build the Python CLI
  • build-wasm — build WASM plugins
  • check — run pre-commit checks
  • new-provider — scaffold a new LLM provider adapter
  • pr — create a pull request
  • release — create a release
  • test-python — run Python tests
05

Prompts

Plano — Prompts

Plano is an infrastructure proxy. Its "prompts" are YAML configuration and the CLAUDE.md developer instructions.

YAML Agent Description as Routing Prompt

The agent description field in config.yaml is effectively a routing prompt consumed by the Plano-Orchestrator LLM:

agents:
  - id: weather_agent
    description: |
      Gets real-time weather and forecasts for any city worldwide.
      Handles: "What's the weather in Paris?", "Will it rain in Tokyo?"

  - id: flight_agent
    description: |
      Searches flights between airports with live status and schedules.
      Handles: "Flights from NYC to LA", "Show me flights to Seattle"

Prompting technique: Few-shot intent examples embedded in the description. The Plano-Orchestrator uses these examples to classify incoming user messages and route to the correct agent. This is a novel "description as routing prompt" pattern where natural language descriptions drive deterministic routing via a specialized LLM.

CLAUDE.md (Developer Instructions)

# CLAUDE.md

Plano is an AI-native proxy server and data plane for agentic applications, built on Envoy proxy.
It centralizes agent orchestration, LLM routing, observability, and safety guardrails as an
out-of-process dataplane.

## Architecture

Client → Envoy (prompt_gateway.wasm → llm_gateway.wasm) → Agents/LLM Providers
                              ↕
                         brightstaff (native binary: state, routing, signals, tracing)

## Build & Test Commands

# Rust — WASM plugins (must target wasm32-wasip1)
cd crates && cargo build --release --target=wasm32-wasip1 -p llm_gateway -p prompt_gateway

# Rust — brightstaff binary (native target)
cd crates && cargo build --release -p brightstaff

Prompting technique: Architecture diagram as ASCII art embedded in developer instructions. The pipeline diagram (Client → Envoy → ...) gives AI coding assistants a mental model of data flow without requiring prose explanation. Critically specifies the WASM build target (wasm32-wasip1) — a concrete constraint that prevents AI assistants from using the wrong target.

Prompt Targets

planoai generate_prompt_targets generates definitions that map prompts to deterministic API calls:

# Example prompt target
- name: "get_weather"
  prompt_template: "Get weather for {city} on {date}"
  target_url: "http://weather-api/forecast"
  parameter_extraction: [city, date]

Prompting technique: Template-based prompt-to-function mapping. Deterministic (not LLM-routed) for specific known intents.

09

Uniqueness

Plano — Uniqueness & Positioning

Differs from Seeds

Plano is categorically different from all 11 seeds — it is an AI-native data plane, not a coding methodology. The closest seeds are the MCP-anchored archetype (taskmaster-ai, claude-flow) but those are coding workflow tools; Plano routes production AI agent traffic. Within this batch, Plano competes directly with ContextForge (IBM), AgentGateway (LF), and Archestra — all MCP/AI gateways. Plano's differentiators: (1) built on Envoy rather than custom Rust, (2) the purpose-built Plano-Orchestrator 4B LLM for routing, (3) agents-as-HTTP-services philosophy (no SDK required), and (4) YAML-declared routing via natural language descriptions.

Unique Characteristics

  1. Envoy foundation: Only framework in corpus built directly on Envoy proxy by its core contributors. Envoy is the reference implementation for production service mesh proxy — this brings proven production infrastructure to AI routing.
  2. Plano-Orchestrator (4B LLM): Purpose-built routing model. No other framework in corpus ships its own routing LLM. The 4B parameter size optimizes for routing decisions at low cost — not general AI capability.
  3. WASM filter plugins: prompt_gateway.wasm and llm_gateway.wasm are compiled to WebAssembly for Envoy — sandboxed, portable, zero-copy filter execution. Only WASM-based AI proxy in corpus.
  4. Agent = HTTP server, zero SDK: The "your agent is just a POST endpoint" philosophy is more opinionated than any other framework in corpus. No SDK installation, no framework coupling.
  5. description = routing intent: YAML agent descriptions with embedded examples serve as the routing prompt for the Plano-Orchestrator. Novel pattern: natural language descriptions are executable routing configuration.
  6. Agentic Signals trademark: The "Agentic Signals™" brand (zero-code structured trace capture) differentiates Plano's observability story from generic OTel instrumentation.

Positioning

"The Envoy for AI agents." For teams already running Envoy-based service meshes who want to add AI agents without bolting on a separate infrastructure layer. Also positioned for teams building multi-agent applications who want natural language routing without writing classifier code.

Observable Failure Modes

  • Plano-Orchestrator availability: Free hosted in US-central for dev; production requires self-hosting or paid tier. Not fully open-source (the 4B model weights are not freely distributed).
  • Envoy complexity: Building custom WASM plugins requires Rust + wasm32-wasip1 — steep learning curve for organizations without Rust expertise.
  • Stateless agents assumption: Plano assumes agents are stateless HTTP services. Stateful agents require additional state management infrastructure.
  • supervisord dependency: The supervisor process management adds operational complexity vs. pure Docker or Kubernetes deployments.

Cross-References

  • Competes with: ContextForge, AgentGateway, Archestra — all AI proxy/gateway frameworks in this batch
  • Built on: Envoy proxy (same team)
  • Industry research: "industry-leading LLM research" referenced in README (not cited)
04

Workflow

Plano — Workflow

Agent Developer Workflow

Phase Artifact
Write agent Plain HTTP server (FastAPI, Flask, Express, etc.) implementing POST /v1/chat/completions
Define config config.yaml declaring agents, model providers, listeners, router
Start Plano planoai up config.yaml
Query Send requests to http://localhost:{port}/v1/chat/completions
Observe planoai trace — view automatic OTEL traces
Add agents Edit config.yaml, restart — no routing code changes

Multi-Agent Request Flow

User → Listener (port 8001)
     → Plano-Orchestrator evaluates intent vs. agent descriptions
     → Routes to weather_agent OR flight_agent (or both sequentially)
     → Passes through filter chain (guardrails, moderation, memory)
     → Forwards to LLM Gateway (model selection, provider normalization)
     → Response returned with traces

LLM Gateway Flow

Agent requests LLM → lightstaff/hermesllm normalizes provider API
                   → Selects model (default or specified)
                   → Handles API quirks per provider
                   → Returns normalized response
                   → Captures Agentic Signal (token usage, latency, etc.)

Filter Chain Execution

  1. Request enters Envoy
  2. prompt_gateway.wasm executes filter chain (pre-request):
    • Jailbreak filter
    • Content moderation
    • Memory injection
    • PII check
  3. Request forwarded to appropriate agent
  4. Response passes through post-response filter chain
  5. Agentic Signal captured automatically

Config Changes / Adding Agents

Adding a new agent requires only YAML config changes — no code changes to existing agents, no routing logic updates, no framework migrations.

Approval Gates

None — Plano is an infrastructure proxy with automatic policy enforcement via filter chains. No interactive approval gates.

06

Memory Context

Plano — Memory & Context

State Storage

brightstaff native binary manages runtime state:

  • Agent routing state (active connections, session context)
  • LLM provider state (active models, request queues)
  • Signal capture state (Agentic Signal accumulators)

No external database required for basic deployments.

Memory Hooks (Filter Chain)

Plano's filter chains support memory injection/extraction:

  • Pre-request memory injection: Fetch relevant memories from a memory store and inject into the request context before forwarding to the agent
  • Post-response memory extraction: Extract new information from agent responses and store in memory

This is a hook-based memory system — the memory store implementation is plugged in via filter chain configuration.

Agentic Signals

Zero-code capture of structured data points from every interaction:

  • Token usage per agent per turn
  • Latency at each stage (routing, LLM, agent)
  • Model selection decisions
  • Routing decisions (which agent handled what)
  • Error rates

Exported as OTEL traces — observable in Jaeger, Zipkin, or any OTEL backend.

Trace State

planoai trace command connects to the trace listener started by brightstaff (or start_trace_listener_background). Traces are streamed for real-time observation.

Cross-Session State

Sessions are not persistent by default. The filter chain memory hooks can inject external memory for session continuity.

Config State

config.yaml is the authoritative state for the system topology. Changes require planoai down && planoai up.

PII State

The common crate handles PII detection and redaction inline during request processing. No separate PII audit log mentioned.

07

Orchestration

Plano — Orchestration

Multi-Agent Support

Yes — multi-agent routing is Plano's core value proposition. The Plano-Orchestrator routes requests to multiple downstream agents based on intent.

Orchestration Pattern

Hierarchical — the Plano-Orchestrator (4B LLM) acts as the orchestrator, routing to registered agents. Each agent handles its own sub-task and returns results. Can also support sequential chaining where the orchestrator routes through multiple agents in sequence.

Execution Mode

Background daemonplanoai up starts Envoy + brightstaff as background processes managed by supervisord.

Multi-Model Routing

Yes — primary value proposition of the LLM Gateway:

  • Unified OpenAI-compatible API abstracting provider differences
  • Default model per config (e.g., openai/gpt-4o)
  • Fallback to alternative providers on failure
  • Model selection by name or alias
  • Future: automatic routing by task complexity (like Plano's 4B orchestrator for routing decisions)

Plano-Orchestrator Details

  • Size: 4B parameters
  • Specialization: Intent classification for routing decisions (not general intelligence)
  • Cost: "Fraction of GPT-4" for routing (documented claim)
  • Hosted: Free in US-central for development; self-hosted or paid for production
  • v1: plano_orchestrator_v1 — the current routing model

Isolation Mechanism

None — agents are separate HTTP services (by developer design). Physical isolation is the developer's responsibility. Plano routes between already-isolated services.

Filter Chain as Sequential Processing

The prompt_gateway.wasm filter chain is a sequential pipeline applied to every request/response. This is not agent orchestration but request transformation.

Consensus Mechanism

None. Single orchestrator makes routing decisions. No distributed consensus.

08

Ui Cli Surface

Plano — UI/CLI Surface

CLI Binary

Name: planoai Type: Standalone Python CLI (not a thin wrapper — starts/manages Docker services) Install: pip install planoai (inside cli/) Framework: rich-click (click + rich for styled output)

CLI Commands

Command Purpose
planoai up [config.yaml] [--docker] Start Plano (Envoy + brightstaff via supervisord or Docker)
planoai down [--docker] [--verbose] Stop Plano
planoai build [--docker] Build Docker image
planoai logs [--debug] [--follow] [--docker] Stream gateway logs
planoai trace [--default-otel-endpoint] View OTEL traces (starts trace listener if needed)
planoai init Initialize new config
planoai cli_agent Interactive CLI for testing agents
planoai generate_prompt_targets [--file] [--path] [--settings] Generate prompt target definitions

Local Web Dashboard

Exists: No dedicated web dashboard. Observability through planoai trace (CLI trace viewer) or external OTEL backends.

A ChatGPT-like interface may exist in apps/ but is not documented as a core feature.

Observability

  • Agentic Signals: Automatic OTEL spans for every request (no code changes)
  • planoai trace: CLI-based trace inspection
  • OTEL export: Compatible with Jaeger, Zipkin, Grafana Tempo
  • Default OTEL gRPC endpoint: configurable via --default-otel-grpc-endpoint

Demo

docs/source/_static/img/demo_tracing.png — screenshot of automatic trace output

Development UI (Claude Code skills)

.claude/skills/ provides 8 developer skills for the Plano repository itself:

  • build-brightstaff, build-cli, build-wasm, check, new-provider, pr, release, test-python

These are development workflow accelerators, not end-user features.

Integration Points

  • Agents connect as HTTP servers implementing POST /v1/chat/completions
  • LLM providers accessed via OpenAI-compatible API
  • OTEL backends for trace export
  • No dedicated MCP integration (standard HTTP services)

Related frameworks

same archetype · same primary tool · same memory type

claude-mem (thedotmack) ★ 78k

Background worker service captures every tool call as an observation, AI-compresses sessions, and auto-injects relevant past…

pi (badlogic/earendil) ★ 55k

A minimal, hackable, multi-provider terminal coding agent that adapts to your workflows via npm-installable TypeScript Extensions…

Agent Skills (Addy Osmani) ★ 46k

Encodes senior-engineer software development lifecycle as 23 auto-routed skills and 7 slash commands for any AI coding agent.

wshobson/agents Plugin Marketplace ★ 36k

Single Markdown source for 83 domain-specialized plugins that auto-generates idiomatic artifacts for five AI coding harnesses.

TabbyML/Tabby ★ 34k

Self-hosted AI coding assistant server (alternative to GitHub Copilot) with admin dashboard, RAG-based completions, and multi-IDE…

Compound Engineering ★ 17k

Make each unit of engineering work compound into easier future work via brainstorm→plan→execute→review→learn cycles.