pydantic-ai-harness

pydantic-ai-harness · pydantic/pydantic-ai-harness · ★ 354 · last commit 2026-05-01

Primitive shape

No installable primitives

Summary

Pydantic AI Harness — Summary

Pydantic AI Harness (pip: pydantic-ai-harness) is the official capability library for Pydantic AI, maintained by the Pydantic team. Its stated purpose is "batteries for your Pydantic AI agent" — standalone building blocks (capabilities, lifecycle hooks, tool bundles) that extend what a Pydantic AI Agent can do without framework changes. The first shipped capability is CodeMode: a run_code tool that sandboxes all agent tools behind a Python execution environment powered by Monty (pydantic-monty), allowing the model to write Python that calls multiple tools with loops, conditionals, and asyncio.gather — reducing N tool calls to one model round-trip. The capability matrix tracks 30+ planned capabilities (filesystem, shell, memory, session persistence, sub-agents, skills, input/output guardrails, secret masking, approval workflows, stuck loop detection, etc.) of which most are in active PRs as of v0.0.x.

Compared to seeds, Pydantic AI Harness is closest to spec-kit in being an extension pack for an existing framework, but differs: where spec-kit extends Claude Code (an AI tool), pydantic-ai-harness extends Pydantic AI (a Python agent library) — making it a pip extension pack rather than an IDE plugin.

Overview

Pydantic AI Harness — Overview

Origin

Developed and maintained by the Pydantic AI team (Pydantic, Inc). Douwe Maan, David SF, and Aditya Vardhan are listed as authors. It is the official companion library for the Pydantic AI framework.

Philosophy

From the README:

"Pydantic AI's capabilities and hooks API is how you give an agent its harness — bundles of tools, lifecycle hooks, instructions, and model settings that extend what the agent can do without any framework changes."

"Pydantic AI Harness is the official capability library for Pydantic AI, maintained by the Pydantic AI team. Pydantic AI core ships capabilities that require model or framework support, and capabilities fundamental to every agent — web search, tool search, thinking. Everything else lives here: standalone building blocks you pick and choose to turn your agent into a coding agent, a research assistant, or anything else."

Graduated Capabilities Model

From README: "This is also where new capabilities start — as they stabilize and prove themselves broadly essential, they can graduate into core."

This is an explicit capability graduation pipeline: pydantic-ai-harness → pydantic-ai core.

CodeMode: Flagship Feature

The only shipped capability as of v0.0.x. From README:

"Wraps every tool into a single run_code tool, sandboxed by Monty... The model writes Python that calls multiple tools with loops, conditionals, asyncio.gather, and local filtering — one model round-trip for N tool calls."

Community Package Endorsement

The README explicitly endorses community packages by vstorm-co (pydantic-ai-backend, pydantic-ai-shields, pydantic-ai-skills, etc.) as interim solutions while official capabilities are in development.

Architecture

Pydantic AI Harness — Architecture

Package Structure

pydantic-ai-harness/
├── pydantic_ai_harness/
│   ├── __init__.py            # Public API surface
│   └── code_mode/
│       ├── _capability.py     # CodeMode dataclass (AbstractCapability)
│       ├── _toolset.py        # CodeModeToolset wrapping logic
│       └── README.md
├── .agents/                   # Agent definition files
├── .claude/                   # Claude Code integration files
├── AGENTS.md                  # Agent instructions
├── CLAUDE.md                  # Claude Code instructions
├── Makefile
├── tests/
└── pyproject.toml

Dependency Model

Required: pydantic-ai-slim>=1.95.1 — the minimal Pydantic AI core without bundled providers
Optional extras:
- [code-mode] / [codemode] — adds pydantic-monty (the Monty sandbox engine)
- [temporal] — temporal execution support
- [dbos] — DBOS durable execution integration
Versioning: uv-dynamic-versioning (no pinned version in pyproject.toml)
Python requirement: >=3.10

AbstractCapability Contract

All capabilities implement AbstractCapability[AgentDepsT]:

class AbstractCapability(Generic[AgentDepsT]):
    def get_ordering(self) -> CapabilityOrdering: ...
    def get_wrapper_toolset(self, toolset) -> ...: ...

CapabilityOrdering controls composition position: position='outermost' means CodeMode wraps all other tools. The wraps field lists capabilities that CodeMode can wrap (e.g., _ToolSearch).

CodeMode Architecture

CodeMode wraps a toolset behind a single run_code tool:

Model receives run_code(code: str) as its only tool
Submitted code runs in a Monty sandbox (pydantic-monty)
Sandbox exposes all original tools as Python-callable functions
Model writes Python that calls tools with loops, conditionals, asyncio.gather
Result returned to model in a single round-trip

This reduces N sequential tool calls to 1 model round-trip.

Capability Graduation Pipeline

pydantic-ai-harness (incubator)
       ↓ (stabilize + prove essential)
pydantic-ai core (shipped)

Examples already in core (from README): web_search, tool_search, thinking. Everything else starts in harness.

Agent Integration Point

from pydantic_ai import Agent
from pydantic_ai_harness import CodeMode

agent = Agent(
    "openai:gpt-4o",
    capabilities=[CodeMode()],
    tools=[file_read, shell_run, web_fetch],
)

Capabilities are passed at Agent() construction time — no framework changes required.

Components

Pydantic AI Harness — Components

Shipped Capabilities (v0.0.x)

CodeMode

Class: pydantic_ai_harness.code_mode.CodeMode
Dataclass fields:
- tools: ToolSelector[AgentDepsT] = 'all' — which tools to expose inside the sandbox
- max_retries: int = 3 — retry limit for the code execution loop
Mechanism: wraps toolset via CodeModeToolset, which replaces the agent's tool surface with a single run_code tool backed by pydantic-monty
Ordering constraint: position='outermost' — always the outermost capability wrapper
Wraps: _ToolSearch capability (when present)

Planned Capabilities (Active PRs, v0.0.x → v0.1+)

From the capability matrix in README:

Category	Capability
Filesystem	read, write, edit, search
Shell	exec, background, pipe
Memory	short-term, episodic, semantic
Session	persistence, handoff, restore
Sub-agents	spawn, delegate, fan-out
Skills	definition, discovery, composition
Guardrails	input/output, content policy
Secret masking	PII, credentials, env var scrubbing
Approval	human-in-the-loop gates
Stuck loop	detection, recovery
Tool budget	rate limiting, cost tracking
Adaptive reasoning	chain-of-thought, reflection

30+ capabilities total planned.

ToolSelector

ToolSelector[AgentDepsT] is a parameterized type for selecting which tools CodeMode exposes inside the sandbox. Default is 'all'.

Community Packages (Endorsed in README)

The README explicitly endorses vstorm-co packages as interim solutions:

pydantic-ai-backend — model backend adapters
pydantic-ai-shields — guardrails/content moderation
pydantic-ai-skills — skill definitions

These are not part of pydantic-ai-harness but are listed as recommended third-party extensions while official capabilities are in development.

Hooks / Lifecycle

No lifecycle hooks shipped yet in v0.0.x. The .claude/ and .agents/ directories in the repo itself (for development purposes) use hooks, but these are not part of the distributed package API.

pydantic-monty (Dependency)

The Monty sandbox is a separate pip package (pydantic-monty) that provides the code execution environment for CodeMode. It is pulled in only when [code-mode] extra is installed.

Prompts

Pydantic AI Harness — Prompts

Prompt Injection Pattern

Capabilities can inject instructions into the agent's system prompt via the AbstractCapability interface. CodeMode injects a description of the run_code tool and usage guidelines.

No prompt templates are shipped as separate files in v0.0.x. Prompt content lives inside capability implementations.

CodeMode Prompt Injection (Inferred from Architecture)

When CodeMode is active, the model is told:

It has access to a single run_code(code: str) tool
The code runs in a Python sandbox with all original tools available as async functions
It should use loops, conditionals, and asyncio.gather for parallel execution
max_retries governs how many times failed code can be resubmitted

AGENTS.md / CLAUDE.md (Development-Time, Not Runtime)

The repo ships AGENTS.md and CLAUDE.md at the root. These are for contributors using Claude Code or other AI coding assistants to work on the harness itself. They are NOT injected at runtime into agents built with pydantic-ai-harness.

Capability Instructions Interface

From the graduated capability model, each capability contributes:

Tool definitions — added to the agent's tool surface (or replaced by CodeMode)
System prompt fragments — capability-specific instructions
Lifecycle hooks — pre/post tool call handlers

The exact prompt format for each planned capability is not yet defined in v0.0.x source.

No External Prompt Files

Unlike many frameworks in this batch (chorus, oh-my-agent, ai-toolkit-uniswap), pydantic-ai-harness does NOT use .md files in a .agents/ or .claude/ folder as runtime agent configuration. The .agents/ directory in the repo is for development-time AI tooling only.

Uniqueness

Pydantic AI Harness — Uniqueness

Differs from Seeds

pydantic-ai-harness is closest to spec-kit in being an extension pack for an existing framework — but it extends Pydantic AI (a Python agent library), not Claude Code (an AI coding tool). This makes it a pip extension pack rather than an IDE plugin. None of the 11 seeds use a capability graduation pipeline (harness → core). None of the seeds implement a "CodeMode" sandbox where all tools collapse to a single run_code tool that executes Python inside a sandboxed environment. The closest seed is superpowers (Claude Code tool bundles), but superpowers extends a CLI, not a Python library API.

Distinctive Position

Only official capability extension library for a Python agent framework — maintained by the framework team itself (Pydantic, Inc), not a third-party wrapper
Capability graduation pipeline — explicit mechanism for features to move from harness to core as they prove essential; no other framework in this batch has this two-tier model
CodeMode / run_code sandbox — collapsing N tools to 1 tool backed by sandboxed Python execution (via pydantic-monty) is a unique primitive; no other batch-28 framework does this
30+ planned capabilities in active PRs — the most ambitious capability roadmap in this batch; ships nearly empty but with a full public backlog
Community package endorsement — the README explicitly names and endorses vstorm-co packages as interim solutions; unusual transparency about what's not yet built
[temporal] and [dbos] extras — signals a durable execution roadmap (process-restart-safe agents) that no other pip-package harness in this batch addresses

Explicit Anti-Patterns / Design Constraints

Not a standalone agent framework — requires Pydantic AI as the runtime
No CLI, no UI, no scaffold: pure library, no project ceremony
No bundled prompts or templates — capabilities inject their own instructions
Won't add anything that belongs in pydantic-ai core (web_search, tool_search, thinking already graduated)

Observable Failure Modes

v0.0.x with nearly empty feature set — risk of abandonment before roadmap executes
Stars (354) are modest for an "official" library — may be pre-announcement
Dynamic versioning means no stable version to pin
pydantic-monty is a separate dependency with its own stability risks
Deep dependency on pydantic-ai-slim>=1.95.1 — tightly coupled to a rapidly evolving upstream

Inspired By

Pydantic AI framework ecosystem. The "batteries for your agent" positioning mirrors Python's "batteries included" philosophy applied to agent capabilities.

Workflow

Pydantic AI Harness — Workflow

Installation

pip install pydantic-ai-harness[code-mode]

No CLI binary. No init command. No project scaffold. Pure Python library.

Basic Usage (Quick-Start from README)

from pydantic_ai import Agent
from pydantic_ai_harness import CodeMode

agent = Agent(
    "openai:gpt-4o",
    capabilities=[CodeMode()],
    tools=[file_read, shell_run, web_fetch],
)
result = await agent.run("summarise all markdown files in the repo")

CodeMode Execution Flow

Agent initializes with capabilities=[CodeMode()]
CodeMode replaces the agent's tool surface with a single run_code tool
Model receives: "you have one tool: run_code(code: str)"

Model writes Python:

files = await file_read("*.md")
summaries = await asyncio.gather(*[summarise(f) for f in files])
return "\n".join(summaries)

Monty sandbox executes the Python, calling original tools as async functions
Results returned to model in one round-trip

"Ecosystem Agent" Pattern (from README)

from pydantic_ai_harness import CodeMode, Filesystem, Shell, Memory

coding_agent = Agent(
    "anthropic:claude-opus-4",
    capabilities=[
        CodeMode(),
        Filesystem(root="."),
        Shell(allow=["git", "pytest", "ruff"]),
        Memory(backend="sqlite"),
    ],
)

This pattern (multiple capabilities stacked) is the intended production usage — CodeMode wraps everything else.

Capability Ordering Rules

CodeMode must be position='outermost'
Other capabilities compose inside CodeMode's sandbox
Ordering is enforced at Agent construction time via CapabilityOrdering

Development Workflow (Repo-Level)

The repo itself uses .claude/ and .agents/ for development, meaning the harness is developed using the same Pydantic AI + Claude Code toolchain it extends. The Makefile has targets for test, lint, and type-check.

Versioning and Stability

Uses uv-dynamic-versioning — version derived from git tags
Currently at v0.0.x — pre-1.0, breaking changes expected
Capability graduation pipeline: capabilities move to pydantic-ai core when proven essential

Memory Context

Pydantic AI Harness — Memory & Context

Current State (v0.0.x)

No memory capabilities are shipped in the initial release. The capability matrix lists planned memory types:

Short-term memory — in-context conversation history (planned)
Episodic memory — session-based recall (planned)
Semantic memory — vector/embedding-based retrieval (planned)

Context Window Management

Context management is handled by pydantic-ai core, not by the harness. The harness focuses on what tools the agent can call, not how the model's context is managed.

CodeMode reduces context consumption indirectly: by batching N tool calls into one Python script execution, fewer round-trips occur, which means fewer tool-call/tool-result pairs accumulate in the context window.

Planned Memory Architecture

From the capability matrix:

Memory(backend="sqlite") — SQLite-backed episodic memory (shown in ecosystem agent example)
Memory(backend="redis") — Redis-backed for persistent/shared memory
Semantic search over past tool results

Session Persistence

Planned capability: SessionPersistence — allows agent state to survive across process restarts. Architecture not yet defined in v0.0.x source.

State in CodeMode Sandbox

Within a single CodeMode invocation, the Python code executed in Monty has access to:

All tool return values (as Python objects)
Local variables defined in the script
No persistence across separate run_code calls within the same turn

Cross-turn memory requires an explicit memory capability (planned).

Extras for Durable Execution

The [temporal] and [dbos] extras suggest that long-horizon, durable execution (with state checkpointing) is planned for capabilities that survive process restarts — analogous to hankweave's codon checkpointing but at the Python package level.

Orchestration

Pydantic AI Harness — Orchestration

Single-Agent, Capability-Augmented

pydantic-ai-harness is not an orchestration framework. It augments a single Agent instance with capability bundles. There is no built-in multi-agent coordination in v0.0.x.

Planned Sub-Agent Capabilities

The capability matrix lists:

SubAgent(spawn=...) — spin up child agents
Delegate(to=...) — hand tasks to specialized agents
FanOut(agents=[...]) — parallel sub-agent execution

These are planned, not shipped. When shipped, they will integrate with Pydantic AI's native sub-agent support.

CodeMode as Implicit Orchestration

CodeMode provides a form of tool orchestration without multiple agents: the model writes Python that calls N tools in parallel (asyncio.gather) within a single sandbox execution. This is "orchestration within a round-trip" rather than "orchestration across agents."

Example:

# Model writes this inside run_code — N parallel tool calls, 1 round-trip
results = await asyncio.gather(
    web_fetch("https://example.com"),
    file_read("README.md"),
    shell_run("git log --oneline -10"),
)

Capability Composition Model

Capabilities compose via the CapabilityOrdering system:

position='outermost' → wraps everything else (CodeMode)
position='inner' → wrapped by outer capabilities
wraps=[...] → explicit dependency declaration

The agent constructor resolves ordering at initialization time. This is the harness's answer to orchestration: compositional capability stacking, not agent spawning.

Lifecycle Hooks (Planned)

The README mentions hooks as a core part of the harness API ("lifecycle hooks" listed alongside tools and instructions). No hook interface is shipped in v0.0.x. When added, hooks will fire pre/post tool call and at agent turn boundaries.

Integration with pydantic-ai Core

pydantic-ai core ships: web_search, tool_search, thinking. The harness adds everything else. The two-layer model means orchestration primitives that prove essential will graduate to core, keeping the harness lean.

Ui Cli Surface

Pydantic AI Harness — UI/CLI Surface

CLI Binary

None. pydantic-ai-harness is a pure Python library. There is no CLI binary, no harness command, no pydantic-ai-harness CLI.

Installation:

pip install pydantic-ai-harness
pip install pydantic-ai-harness[code-mode]  # with CodeMode sandbox

No Init/Scaffold Command

Unlike deepagents-langchain (deepagents init), oh-my-agent (oma init), or water (water init), there is no project scaffolding command. Usage starts with import in Python.

No Local UI

No web dashboard, no terminal TUI, no port exposed. Observability relies on Pydantic AI's built-in logging/tracing or external integrations.

IDE Integration

The repo ships .claude/ and .agents/ directories for contributors using Claude Code or AI coding assistants to work on the harness source code itself. These are development conveniences, not runtime IDE integrations for users of the library.

Makefile Targets (Development)

make test
make lint
make typecheck

Standard Python project development commands. Not relevant to end-users of the library.

API Surface (User-Facing)

The public interface is Python import:

from pydantic_ai_harness import CodeMode
from pydantic_ai_harness.code_mode import CodeModeToolset

The __init__.py exports the top-level capability classes. Individual capability subpackages (e.g., pydantic_ai_harness.filesystem) will be added as capabilities are developed.

Community Extras (vstorm-co)

For users who need more UI/tooling now, the README endorses:

pydantic-ai-backend — provider backends
pydantic-ai-shields — guardrails
pydantic-ai-skills — skill collections

These are third-party packages with their own UI/surface areas.

Related frameworks

same archetype · same primary tool · same memory type

Superpowers Marketplace ★ 1.0k

A99 Unclassified

Single-endpoint Claude Code plugin marketplace for the superpowers plugin ecosystem.

codex-spec (shenli) ★ 45

A99 Unclassified

Automates the spec-to-code pipeline for OpenAI Codex by generating specifications, requirements, plans, and tasks from natural…

backgrounder.dev

A99 Unclassified

Hosted background coding agent interface (closed SaaS — insufficient public material for full analysis).

oh-my-claudecode (mazenyassergithub) ★ 5

A99 Unclassified

Claims multi-agent orchestration with 28 agents and 28 skills but README content is SEO-style with ZIP download instructions…

Anthropic Claude Plugins Official ★ 0

A99 Unclassified

Repo not found — no public content to analyze.

Distribution

Type: pip-package
License: MIT
Install: one-liner
Version: dynamic (uv-dynamic-versioning, v0.0.x)

Surfaces

CLI binary: No
CLI subcmds: 0
Local UI: No

Components

Commands: 0
Skills: 0
Subagents: 0
Hooks: 0
MCP servers: 0
MCP tools: 0
Scripts: 0
Templates: 0

Workflow

Phases: 6
Approval gates: 0
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: No
Pattern: single-agent
Max concurrent: 1
Isolation: none
Consensus: none
Prompt chaining: No

Multi-model

Multi-model: No
BYOK: Yes
Modal: text

Execution

Mode: in-process
Crash recovery: No
Compaction: No
Session handoff: No
Streaming: Yes

Memory

Type: none
Persistence: none
Search: none

Quality

TDD: No
TDD mechanism: none
Self-review: none

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: No
Audit format: none
Replay: No

Tools

Primary: Pydantic AI
Targets: 1
Portability: low

Signals

Stars: 354
Last commit: 2026-05-01
Contributors: 3
Maintainer: active
Quality score: 0/10