Skip to content
/

pydantic-ai-harness

pydantic-ai-harness · pydantic/pydantic-ai-harness · ★ 354 · last commit 2026-05-01

Primitive shape
No installable primitives
00

Summary

Pydantic AI Harness — Summary

Pydantic AI Harness (pip: pydantic-ai-harness) is the official capability library for Pydantic AI, maintained by the Pydantic team. Its stated purpose is "batteries for your Pydantic AI agent" — standalone building blocks (capabilities, lifecycle hooks, tool bundles) that extend what a Pydantic AI Agent can do without framework changes. The first shipped capability is CodeMode: a run_code tool that sandboxes all agent tools behind a Python execution environment powered by Monty (pydantic-monty), allowing the model to write Python that calls multiple tools with loops, conditionals, and asyncio.gather — reducing N tool calls to one model round-trip. The capability matrix tracks 30+ planned capabilities (filesystem, shell, memory, session persistence, sub-agents, skills, input/output guardrails, secret masking, approval workflows, stuck loop detection, etc.) of which most are in active PRs as of v0.0.x.

Compared to seeds, Pydantic AI Harness is closest to spec-kit in being an extension pack for an existing framework, but differs: where spec-kit extends Claude Code (an AI tool), pydantic-ai-harness extends Pydantic AI (a Python agent library) — making it a pip extension pack rather than an IDE plugin.

01

Overview

Pydantic AI Harness — Overview

Origin

Developed and maintained by the Pydantic AI team (Pydantic, Inc). Douwe Maan, David SF, and Aditya Vardhan are listed as authors. It is the official companion library for the Pydantic AI framework.

Philosophy

From the README:

"Pydantic AI's capabilities and hooks API is how you give an agent its harness — bundles of tools, lifecycle hooks, instructions, and model settings that extend what the agent can do without any framework changes."

"Pydantic AI Harness is the official capability library for Pydantic AI, maintained by the Pydantic AI team. Pydantic AI core ships capabilities that require model or framework support, and capabilities fundamental to every agent — web search, tool search, thinking. Everything else lives here: standalone building blocks you pick and choose to turn your agent into a coding agent, a research assistant, or anything else."

Graduated Capabilities Model

From README: "This is also where new capabilities start — as they stabilize and prove themselves broadly essential, they can graduate into core."

This is an explicit capability graduation pipeline: pydantic-ai-harness → pydantic-ai core.

CodeMode: Flagship Feature

The only shipped capability as of v0.0.x. From README:

"Wraps every tool into a single run_code tool, sandboxed by Monty... The model writes Python that calls multiple tools with loops, conditionals, asyncio.gather, and local filtering — one model round-trip for N tool calls."

Community Package Endorsement

The README explicitly endorses community packages by vstorm-co (pydantic-ai-backend, pydantic-ai-shields, pydantic-ai-skills, etc.) as interim solutions while official capabilities are in development.

02

Architecture

Pydantic AI Harness — Architecture

Package Structure

pydantic-ai-harness/
├── pydantic_ai_harness/
│   ├── __init__.py            # Public API surface
│   └── code_mode/
│       ├── _capability.py     # CodeMode dataclass (AbstractCapability)
│       ├── _toolset.py        # CodeModeToolset wrapping logic
│       └── README.md
├── .agents/                   # Agent definition files
├── .claude/                   # Claude Code integration files
├── AGENTS.md                  # Agent instructions
├── CLAUDE.md                  # Claude Code instructions
├── Makefile
├── tests/
└── pyproject.toml

Dependency Model

  • Required: pydantic-ai-slim>=1.95.1 — the minimal Pydantic AI core without bundled providers
  • Optional extras:
    • [code-mode] / [codemode] — adds pydantic-monty (the Monty sandbox engine)
    • [temporal] — temporal execution support
    • [dbos] — DBOS durable execution integration
  • Versioning: uv-dynamic-versioning (no pinned version in pyproject.toml)
  • Python requirement: >=3.10

AbstractCapability Contract

All capabilities implement AbstractCapability[AgentDepsT]:

class AbstractCapability(Generic[AgentDepsT]):
    def get_ordering(self) -> CapabilityOrdering: ...
    def get_wrapper_toolset(self, toolset) -> ...: ...

CapabilityOrdering controls composition position: position='outermost' means CodeMode wraps all other tools. The wraps field lists capabilities that CodeMode can wrap (e.g., _ToolSearch).

CodeMode Architecture

CodeMode wraps a toolset behind a single run_code tool:

  1. Model receives run_code(code: str) as its only tool
  2. Submitted code runs in a Monty sandbox (pydantic-monty)
  3. Sandbox exposes all original tools as Python-callable functions
  4. Model writes Python that calls tools with loops, conditionals, asyncio.gather
  5. Result returned to model in a single round-trip

This reduces N sequential tool calls to 1 model round-trip.

Capability Graduation Pipeline

pydantic-ai-harness (incubator)
       ↓ (stabilize + prove essential)
pydantic-ai core (shipped)

Examples already in core (from README): web_search, tool_search, thinking. Everything else starts in harness.

Agent Integration Point

from pydantic_ai import Agent
from pydantic_ai_harness import CodeMode

agent = Agent(
    "openai:gpt-4o",
    capabilities=[CodeMode()],
    tools=[file_read, shell_run, web_fetch],
)

Capabilities are passed at Agent() construction time — no framework changes required.

03

Components

Pydantic AI Harness — Components

Shipped Capabilities (v0.0.x)

CodeMode

  • Class: pydantic_ai_harness.code_mode.CodeMode
  • Dataclass fields:
    • tools: ToolSelector[AgentDepsT] = 'all' — which tools to expose inside the sandbox
    • max_retries: int = 3 — retry limit for the code execution loop
  • Mechanism: wraps toolset via CodeModeToolset, which replaces the agent's tool surface with a single run_code tool backed by pydantic-monty
  • Ordering constraint: position='outermost' — always the outermost capability wrapper
  • Wraps: _ToolSearch capability (when present)

Planned Capabilities (Active PRs, v0.0.x → v0.1+)

From the capability matrix in README:

Category Capability
Filesystem read, write, edit, search
Shell exec, background, pipe
Memory short-term, episodic, semantic
Session persistence, handoff, restore
Sub-agents spawn, delegate, fan-out
Skills definition, discovery, composition
Guardrails input/output, content policy
Secret masking PII, credentials, env var scrubbing
Approval human-in-the-loop gates
Stuck loop detection, recovery
Tool budget rate limiting, cost tracking
Adaptive reasoning chain-of-thought, reflection

30+ capabilities total planned.

ToolSelector

ToolSelector[AgentDepsT] is a parameterized type for selecting which tools CodeMode exposes inside the sandbox. Default is 'all'.

Community Packages (Endorsed in README)

The README explicitly endorses vstorm-co packages as interim solutions:

  • pydantic-ai-backend — model backend adapters
  • pydantic-ai-shields — guardrails/content moderation
  • pydantic-ai-skills — skill definitions

These are not part of pydantic-ai-harness but are listed as recommended third-party extensions while official capabilities are in development.

Hooks / Lifecycle

No lifecycle hooks shipped yet in v0.0.x. The .claude/ and .agents/ directories in the repo itself (for development purposes) use hooks, but these are not part of the distributed package API.

pydantic-monty (Dependency)

The Monty sandbox is a separate pip package (pydantic-monty) that provides the code execution environment for CodeMode. It is pulled in only when [code-mode] extra is installed.

05

Prompts

Pydantic AI Harness — Prompts

Prompt Injection Pattern

Capabilities can inject instructions into the agent's system prompt via the AbstractCapability interface. CodeMode injects a description of the run_code tool and usage guidelines.

No prompt templates are shipped as separate files in v0.0.x. Prompt content lives inside capability implementations.

CodeMode Prompt Injection (Inferred from Architecture)

When CodeMode is active, the model is told:

  • It has access to a single run_code(code: str) tool
  • The code runs in a Python sandbox with all original tools available as async functions
  • It should use loops, conditionals, and asyncio.gather for parallel execution
  • max_retries governs how many times failed code can be resubmitted

AGENTS.md / CLAUDE.md (Development-Time, Not Runtime)

The repo ships AGENTS.md and CLAUDE.md at the root. These are for contributors using Claude Code or other AI coding assistants to work on the harness itself. They are NOT injected at runtime into agents built with pydantic-ai-harness.

Capability Instructions Interface

From the graduated capability model, each capability contributes:

  1. Tool definitions — added to the agent's tool surface (or replaced by CodeMode)
  2. System prompt fragments — capability-specific instructions
  3. Lifecycle hooks — pre/post tool call handlers

The exact prompt format for each planned capability is not yet defined in v0.0.x source.

No External Prompt Files

Unlike many frameworks in this batch (chorus, oh-my-agent, ai-toolkit-uniswap), pydantic-ai-harness does NOT use .md files in a .agents/ or .claude/ folder as runtime agent configuration. The .agents/ directory in the repo is for development-time AI tooling only.

09

Uniqueness

Pydantic AI Harness — Uniqueness

Differs from Seeds

pydantic-ai-harness is closest to spec-kit in being an extension pack for an existing framework — but it extends Pydantic AI (a Python agent library), not Claude Code (an AI coding tool). This makes it a pip extension pack rather than an IDE plugin. None of the 11 seeds use a capability graduation pipeline (harness → core). None of the seeds implement a "CodeMode" sandbox where all tools collapse to a single run_code tool that executes Python inside a sandboxed environment. The closest seed is superpowers (Claude Code tool bundles), but superpowers extends a CLI, not a Python library API.

Distinctive Position

  1. Only official capability extension library for a Python agent framework — maintained by the framework team itself (Pydantic, Inc), not a third-party wrapper
  2. Capability graduation pipeline — explicit mechanism for features to move from harness to core as they prove essential; no other framework in this batch has this two-tier model
  3. CodeMode / run_code sandbox — collapsing N tools to 1 tool backed by sandboxed Python execution (via pydantic-monty) is a unique primitive; no other batch-28 framework does this
  4. 30+ planned capabilities in active PRs — the most ambitious capability roadmap in this batch; ships nearly empty but with a full public backlog
  5. Community package endorsement — the README explicitly names and endorses vstorm-co packages as interim solutions; unusual transparency about what's not yet built
  6. [temporal] and [dbos] extras — signals a durable execution roadmap (process-restart-safe agents) that no other pip-package harness in this batch addresses

Explicit Anti-Patterns / Design Constraints

  • Not a standalone agent framework — requires Pydantic AI as the runtime
  • No CLI, no UI, no scaffold: pure library, no project ceremony
  • No bundled prompts or templates — capabilities inject their own instructions
  • Won't add anything that belongs in pydantic-ai core (web_search, tool_search, thinking already graduated)

Observable Failure Modes

  • v0.0.x with nearly empty feature set — risk of abandonment before roadmap executes
  • Stars (354) are modest for an "official" library — may be pre-announcement
  • Dynamic versioning means no stable version to pin
  • pydantic-monty is a separate dependency with its own stability risks
  • Deep dependency on pydantic-ai-slim>=1.95.1 — tightly coupled to a rapidly evolving upstream

Inspired By

Pydantic AI framework ecosystem. The "batteries for your agent" positioning mirrors Python's "batteries included" philosophy applied to agent capabilities.

04

Workflow

Pydantic AI Harness — Workflow

Installation

pip install pydantic-ai-harness[code-mode]

No CLI binary. No init command. No project scaffold. Pure Python library.

Basic Usage (Quick-Start from README)

from pydantic_ai import Agent
from pydantic_ai_harness import CodeMode

agent = Agent(
    "openai:gpt-4o",
    capabilities=[CodeMode()],
    tools=[file_read, shell_run, web_fetch],
)
result = await agent.run("summarise all markdown files in the repo")

CodeMode Execution Flow

  1. Agent initializes with capabilities=[CodeMode()]
  2. CodeMode replaces the agent's tool surface with a single run_code tool
  3. Model receives: "you have one tool: run_code(code: str)"
  4. Model writes Python:
    files = await file_read("*.md")
    summaries = await asyncio.gather(*[summarise(f) for f in files])
    return "\n".join(summaries)
    
  5. Monty sandbox executes the Python, calling original tools as async functions
  6. Results returned to model in one round-trip

"Ecosystem Agent" Pattern (from README)

from pydantic_ai_harness import CodeMode, Filesystem, Shell, Memory

coding_agent = Agent(
    "anthropic:claude-opus-4",
    capabilities=[
        CodeMode(),
        Filesystem(root="."),
        Shell(allow=["git", "pytest", "ruff"]),
        Memory(backend="sqlite"),
    ],
)

This pattern (multiple capabilities stacked) is the intended production usage — CodeMode wraps everything else.

Capability Ordering Rules

  • CodeMode must be position='outermost'
  • Other capabilities compose inside CodeMode's sandbox
  • Ordering is enforced at Agent construction time via CapabilityOrdering

Development Workflow (Repo-Level)

The repo itself uses .claude/ and .agents/ for development, meaning the harness is developed using the same Pydantic AI + Claude Code toolchain it extends. The Makefile has targets for test, lint, and type-check.

Versioning and Stability

  • Uses uv-dynamic-versioning — version derived from git tags
  • Currently at v0.0.x — pre-1.0, breaking changes expected
  • Capability graduation pipeline: capabilities move to pydantic-ai core when proven essential
06

Memory Context

Pydantic AI Harness — Memory & Context

Current State (v0.0.x)

No memory capabilities are shipped in the initial release. The capability matrix lists planned memory types:

  • Short-term memory — in-context conversation history (planned)
  • Episodic memory — session-based recall (planned)
  • Semantic memory — vector/embedding-based retrieval (planned)

Context Window Management

Context management is handled by pydantic-ai core, not by the harness. The harness focuses on what tools the agent can call, not how the model's context is managed.

CodeMode reduces context consumption indirectly: by batching N tool calls into one Python script execution, fewer round-trips occur, which means fewer tool-call/tool-result pairs accumulate in the context window.

Planned Memory Architecture

From the capability matrix:

  • Memory(backend="sqlite") — SQLite-backed episodic memory (shown in ecosystem agent example)
  • Memory(backend="redis") — Redis-backed for persistent/shared memory
  • Semantic search over past tool results

Session Persistence

Planned capability: SessionPersistence — allows agent state to survive across process restarts. Architecture not yet defined in v0.0.x source.

State in CodeMode Sandbox

Within a single CodeMode invocation, the Python code executed in Monty has access to:

  • All tool return values (as Python objects)
  • Local variables defined in the script
  • No persistence across separate run_code calls within the same turn

Cross-turn memory requires an explicit memory capability (planned).

Extras for Durable Execution

The [temporal] and [dbos] extras suggest that long-horizon, durable execution (with state checkpointing) is planned for capabilities that survive process restarts — analogous to hankweave's codon checkpointing but at the Python package level.

07

Orchestration

Pydantic AI Harness — Orchestration

Single-Agent, Capability-Augmented

pydantic-ai-harness is not an orchestration framework. It augments a single Agent instance with capability bundles. There is no built-in multi-agent coordination in v0.0.x.

Planned Sub-Agent Capabilities

The capability matrix lists:

  • SubAgent(spawn=...) — spin up child agents
  • Delegate(to=...) — hand tasks to specialized agents
  • FanOut(agents=[...]) — parallel sub-agent execution

These are planned, not shipped. When shipped, they will integrate with Pydantic AI's native sub-agent support.

CodeMode as Implicit Orchestration

CodeMode provides a form of tool orchestration without multiple agents: the model writes Python that calls N tools in parallel (asyncio.gather) within a single sandbox execution. This is "orchestration within a round-trip" rather than "orchestration across agents."

Example:

# Model writes this inside run_code — N parallel tool calls, 1 round-trip
results = await asyncio.gather(
    web_fetch("https://example.com"),
    file_read("README.md"),
    shell_run("git log --oneline -10"),
)

Capability Composition Model

Capabilities compose via the CapabilityOrdering system:

  • position='outermost' → wraps everything else (CodeMode)
  • position='inner' → wrapped by outer capabilities
  • wraps=[...] → explicit dependency declaration

The agent constructor resolves ordering at initialization time. This is the harness's answer to orchestration: compositional capability stacking, not agent spawning.

Lifecycle Hooks (Planned)

The README mentions hooks as a core part of the harness API ("lifecycle hooks" listed alongside tools and instructions). No hook interface is shipped in v0.0.x. When added, hooks will fire pre/post tool call and at agent turn boundaries.

Integration with pydantic-ai Core

pydantic-ai core ships: web_search, tool_search, thinking. The harness adds everything else. The two-layer model means orchestration primitives that prove essential will graduate to core, keeping the harness lean.

08

Ui Cli Surface

Pydantic AI Harness — UI/CLI Surface

CLI Binary

None. pydantic-ai-harness is a pure Python library. There is no CLI binary, no harness command, no pydantic-ai-harness CLI.

Installation:

pip install pydantic-ai-harness
pip install pydantic-ai-harness[code-mode]  # with CodeMode sandbox

No Init/Scaffold Command

Unlike deepagents-langchain (deepagents init), oh-my-agent (oma init), or water (water init), there is no project scaffolding command. Usage starts with import in Python.

No Local UI

No web dashboard, no terminal TUI, no port exposed. Observability relies on Pydantic AI's built-in logging/tracing or external integrations.

IDE Integration

The repo ships .claude/ and .agents/ directories for contributors using Claude Code or AI coding assistants to work on the harness source code itself. These are development conveniences, not runtime IDE integrations for users of the library.

Makefile Targets (Development)

make test
make lint
make typecheck

Standard Python project development commands. Not relevant to end-users of the library.

API Surface (User-Facing)

The public interface is Python import:

from pydantic_ai_harness import CodeMode
from pydantic_ai_harness.code_mode import CodeModeToolset

The __init__.py exports the top-level capability classes. Individual capability subpackages (e.g., pydantic_ai_harness.filesystem) will be added as capabilities are developed.

Community Extras (vstorm-co)

For users who need more UI/tooling now, the README endorses:

  • pydantic-ai-backend — provider backends
  • pydantic-ai-shields — guardrails
  • pydantic-ai-skills — skill collections

These are third-party packages with their own UI/surface areas.

Related frameworks

same archetype · same primary tool · same memory type

Superpowers Marketplace ★ 1.0k

Single-endpoint Claude Code plugin marketplace for the superpowers plugin ecosystem.

codex-spec (shenli) ★ 45

Automates the spec-to-code pipeline for OpenAI Codex by generating specifications, requirements, plans, and tasks from natural…

backgrounder.dev

Hosted background coding agent interface (closed SaaS — insufficient public material for full analysis).

oh-my-claudecode (mazenyassergithub) ★ 5

Claims multi-agent orchestration with 28 agents and 28 skills but README content is SEO-style with ZIP download instructions…

Anthropic Claude Plugins Official ★ 0

Repo not found — no public content to analyze.