NeMo Agent Toolkit

nemo-agent-toolkit · NVIDIA/NeMo-Agent-Toolkit · ★ 2.3k · last commit 2026-05-26

Primitive shape 11 total

Skills 10 MCP tools 1

Summary

NVIDIA NeMo Agent Toolkit — Summary

NVIDIA NeMo Agent Toolkit (nvidia-nat, 2.3k stars, Apache-2.0) is a Python SDK and CLI (nat) for adding enterprise-grade instrumentation, observability, profiling, optimization, and fine-tuning to AI agents — positioned as a framework-agnostic "intelligence layer" that works alongside LangChain, LlamaIndex, CrewAI, Microsoft Semantic Kernel, Google ADK, and custom Python agents. The core workflow is YAML-defined: a workflow.yml specifies functions (tools), llms (model config), and a workflow (agent type like react_agent) — then nat run --config_file workflow.yml --input "..." executes it. The toolkit ships 10 Claude Code skills for agent-assisted development (workflow creation, installation, evaluation, optimization, MCP/serving, tools, telemetry, path checks, user rules, skill evolution), a nat CLI with telemetry management and component discovery commands, a built-in Chat UI for interaction, an extensive profiling/evaluation/optimization/fine-tuning pipeline, Agent Performance Primitives (APP) for LangChain/CrewAI/Agno acceleration, FastMCP server publishing, LangSmith native integration, and NVIDIA Dynamo runtime integration. The monorepo has 30+ optional package extras. Compared to seeds, NeMo Agent Toolkit most closely resembles the observability layer that none of the seeds provide — it is the only framework in this batch whose primary value is measuring, profiling, and improving agents rather than defining or running them.

Overview

NVIDIA NeMo Agent Toolkit — Overview

Origin

NVIDIA Corporation, maintained by NVIDIA's AI research team. v1.5.0 (migration guide available for v1.5.0+). Available on PyPI as nvidia-nat. The package is an evolution of NVIDIA's earlier NeMo microservices work, now focused on agent instrumentation.

Philosophy

From the README:

"NVIDIA NeMo Agent Toolkit adds intelligence to AI agents across any framework—enhancing speed, accuracy, and decision-making through enterprise-grade instrumentation, observability, and continuous learning."

"Adds intelligence to AI agents across any framework."

The toolkit does not want to be your agent framework. It wants to be the observability and optimization layer on top of your existing agent framework. This is a fundamentally different position from every other framework in this batch.

Design pillars

Framework agnostic: works alongside LangChain, LlamaIndex, CrewAI, Semantic Kernel, ADK, custom Python
YAML-first workflow definition: workflow.yml → nat run — no code for simple agents
Profiling: token-level execution profiling to find bottlenecks
Evaluation: offline eval harness with LangSmith native integration
Optimization: hyper-parameter + prompt optimizer
Fine-tuning: RL-based fine-tuning of LLMs for specific agent behaviors
Agent Performance Primitives (APP): parallel execution, speculative branching, node-level priority routing for graph-based frameworks
NVIDIA Dynamo: runtime inference hints (cache control, load-aware routing, priority-aware serving)

10 Claude Code skills

The toolkit ships 10 skills for agent-assisted development of NeMo workflows — one of the largest skills collections in this batch.

Key quote

"With NeMo Agent Toolkit, you can move quickly, experiment freely, and ensure reliability across all your agent-driven projects."

Architecture

NVIDIA NeMo Agent Toolkit — Architecture

Distribution

Type: pip-package
PyPI name: nvidia-nat
Version: analyzed from develop branch (v1.5.x range)
Install: pip install nvidia-nat
Required runtime: Python 3.11–3.13
License: Apache-2.0

Monorepo packages (selected from `packages/`)

Package	Purpose
`nvidia_nat_core`	Core framework, CLI (`nat`)
`nvidia_nat_langchain`	LangChain/LangGraph integration
`nvidia_nat_crewai`	CrewAI integration
`nvidia_nat_agno`	Agno integration
`nvidia_nat_adk`	Google ADK integration
`nvidia_nat_autogen`	AutoGen integration
`nvidia_nat_eval`	Evaluation harness
`nvidia_nat_fastmcp`	FastMCP server publishing
`nvidia_nat_mcp`	MCP client integration
`nvidia_nat_app`	Agent Performance Primitives
`nvidia_nat_a2a`	A2A protocol
`nvidia_nat_config_optimizer`	Hyper-parameter optimizer
`nvidia_nat_data_flywheel`	Data flywheel for fine-tuning
`nvidia_nat_langsmith`	LangSmith native integration

CLI binary

nat — entry point: nat.cli.main:run_cli.

Key configuration artifact

workflow.yml:

functions:
  wikipedia_search:
    _type: wiki_search
    max_results: 2

llms:
  nim_llm:
    _type: nim
    model_name: nvidia/nemotron-3-nano-30b-a3b
    temperature: 0.0

workflow:
  _type: react_agent
  tool_names: [wikipedia_search]
  llm_name: nim_llm
  verbose: true
  parse_agent_response_max_retries: 3

Skills directory

skills/
  nat-agent-configuration/
  nat-evaluation/
  nat-installation/
  nat-mcp-and-serving/
  nat-optimization/
  nat-path-checks/
  nat-telemetry/
  nat-tools-and-functions/
  nat-user-rules/
  nat-workflow-creation/
  skill-evolution/

NVIDIA NIM integration

LLMs configured with _type: nim use NVIDIA's NIM (NVIDIA Inference Microservices) for hosted model inference on NVIDIA GPUs.

Components

NVIDIA NeMo Agent Toolkit — Components

CLI subcommands (`nat`)

Command	Purpose
`nat run`	Run a workflow from config file
`nat info components`	Discover available component types
`nat configure telemetry`	Enable/disable usage telemetry
(inferred) `nat eval`	Run evaluation harness
(inferred) `nat optimize`	Run hyper-parameter optimizer

Claude Code Skills (10 total)

Skill name	Purpose
`nat-workflow-creation`	Create, edit, validate, run `workflow.yml`; `nat` CLI commands
`nat-agent-configuration`	Configure agent settings and parameters
`nat-evaluation`	Run evaluation experiments, compare outcomes
`nat-installation`	Install the toolkit and dependencies
`nat-mcp-and-serving`	Publish workflows as MCP servers via FastMCP
`nat-optimization`	Hyper-parameter and prompt optimization
`nat-path-checks`	Verify file paths and configurations
`nat-telemetry`	Configure telemetry opt-in/out
`nat-tools-and-functions`	Define and configure tools/functions
`nat-user-rules`	Configure user-specific rules
`skill-evolution`	Manage skill updates

Workflow component types (from `nat info components`)

react_agent — ReAct reasoning + acting
plan_and_execute_agent — plan then execute
llm_chain — simple LLM chain
Custom workflow types via plugins

Function types (tools)

wiki_search — Wikipedia search
web_search — Web search
MCP tools (via nvidia_nat_mcp)
Custom Python functions

Framework integrations (optional packages)

LangChain, LlamaIndex, CrewAI, Microsoft Semantic Kernel, Google ADK, AutoGen, Agno

Agent Performance Primitives (APP)

Accelerates graph-based agent frameworks with:

Parallel execution
Speculative branching
Node-level priority routing

Prompts

NVIDIA NeMo Agent Toolkit — Prompts

Verbatim: `nat-workflow-creation` SKILL.md

---
name: nat-workflow-creation
description: Use when creating, editing, validating, running, or troubleshooting 
NeMo Agent Toolkit workflow YAML, component discovery, LLM configuration, 
and common `nat` CLI commands.
author: NVIDIA Corporation and Affiliates
license: Apache-2.0
---

# NeMo Agent Toolkit Workflow Creation

Use this skill for workflow YAML and command-line execution.

## Workflow

1. Run component discovery before editing `_type` values:

```bash
uv run nat info components -t function
uv run nat info components -t llm_provider

Read the reference that matches the task.
Keep YAML examples runnable from the repository root.
Validate with the smallest useful command, usually:

uv run nat run --config_file path/to/workflow.yml --input "Test request"

References

references/workflow-creation.md
references/cli-reference.md
references/llm-config.md


**Technique**: Skill-as-workflow-reference. The skill prescribes a specific validation command (`nat run`) rather than asking the agent to figure it out. The "run component discovery before editing" instruction prevents hallucinated `_type` values — a common failure mode in YAML-heavy frameworks.

## Verbatim: workflow.yml example

```yaml
functions:
  wikipedia_search:
    _type: wiki_search
    max_results: 2

llms:
  nim_llm:
    _type: nim
    model_name: nvidia/nemotron-3-nano-30b-a3b
    temperature: 0.0
    chat_template_kwargs:
      enable_thinking: false

workflow:
  _type: react_agent
  tool_names: [wikipedia_search]
  llm_name: nim_llm
  verbose: true
  parse_agent_response_max_retries: 3

Technique: Declarative YAML as agent configuration. The _type keys drive component discovery; parse_agent_response_max_retries: 3 is explicit retry logic. enable_thinking: false shows model-level thinking mode control.

Uniqueness

NVIDIA NeMo Agent Toolkit — Uniqueness

Differs from seeds

No seed provides NeMo's primary value. The closest seed is spec-kit (both care about quality validation), but spec-kit validates specification documents against code while NeMo Agent Toolkit validates, profiles, and optimizes agent execution — a completely different layer. Unlike every other framework in this batch (which defines how agents are built or run), NeMo Agent Toolkit's primary purpose is measuring and improving agents that already exist. The 10 Claude Code skills are the second-largest skills collection in the batch (after spec-driver's 24 skills) and the only set explicitly authored by an enterprise AI hardware company.

Positioning

NeMo Agent Toolkit is the enterprise observability and optimization toolkit for AI agents. It targets AI engineering teams at large organizations who have existing agents (built with LangChain, CrewAI, etc.) and want to measure performance, run evals, optimize prompts, and fine-tune models without migrating to a new framework.

Distinctive features

Token-level profiling: profile from workflow to individual tokens — unique in the catalog
RL fine-tuning pipeline: train LLMs specifically for agent behavior using ATIF trajectory data — unique in the catalog
NVIDIA Dynamo integration: runtime inference hints (cache control, load-aware routing) — unique in the catalog
10 Claude Code skills: largest enterprise-authored coding agent skills collection in the batch
Agent Performance Primitives: parallel execution, speculative branching, priority routing for LangChain/CrewAI/Agno
NVIDIA NIM: first-class integration with NVIDIA's hosted inference microservices

Observable failure modes

Framework layer, not agent layer: cannot build agents with NeMo alone; requires another framework underneath
NVIDIA ecosystem coupling: NIM, Dynamo, and fine-tuning deeply tie to NVIDIA's infrastructure
YAML _type fragility: undiscoverable component types lead to silent failures; the skill's "run component discovery first" instruction is a workaround for this
Develop branch only: default branch is develop, not main — implies some instability
License/telemetry concerns: NVIDIA telemetry requires explicit opt-out for CI environments

Workflow

NVIDIA NeMo Agent Toolkit — Workflow

Basic execution workflow

# 1. Create workflow.yml
# 2. Run
nat run --config_file workflow.yml --input "List five subspecies of Aardvarks"

Phase	Artifact
1. Write workflow.yml	YAML config (functions + llm + workflow)
2. `nat run`	Workflow execution
3. Workflow result	Console output

Development workflow with skills

# 1. Discover components
uv run nat info components -t function
uv run nat info components -t llm_provider

# 2. Create workflow
# (AI coding agent uses nat-workflow-creation skill)

# 3. Validate
uv run nat run --config_file workflow.yml --input "Test request"

Evaluation workflow

Phase	Artifact
1. Define eval dataset	Evaluation set
2. Run eval harness	LangSmith experiment or local eval
3. Compare runs	Performance comparison report
4. Iterate	Updated workflow.yml

Optimization workflow

Phase	Artifact
1. Define parameter space	Config variations
2. Run optimizer	Sweep experiments
3. Best config identified	Optimized workflow.yml

Fine-tuning workflow

Phase	Artifact
1. Collect trajectories (ATIF format)	Training data
2. RL fine-tuning via NVIDIA NIM	Fine-tuned model
3. Deploy fine-tuned model	Updated llm config

Approval gates

None built-in. The toolkit defers to the underlying agent framework for human-in-the-loop.

First-run interactive prompt for telemetry consent. Persisted to ~/.config/nat/telemetry.toml.

Memory Context

NVIDIA NeMo Agent Toolkit — Memory & Context

Memory architecture

NeMo Agent Toolkit defers memory management to the underlying agent framework (LangChain, CrewAI, etc.). The toolkit's core value is in instrumentation, not memory.

Optional integration:

nvidia_nat_mem0ai — mem0.ai integration for persistent memory
nvidia_nat_memmachine — MemMachine integration

Workflow state

workflow.yml parameters are stateless configuration — no per-run state is persisted to files by default.

ATIF trajectory format

For fine-tuning, agent trajectories are captured in ATIF (Agent Trajectory Interchange Format) — a structured format for storing agent run data for RL training. This is effectively a write-once audit log.

LangSmith integration

Run history, eval experiments, and prompt versions are stored in LangSmith. This is the primary "memory" for evaluation and improvement workflows.

Telemetry persistence

~/.config/nat/telemetry.toml — stores the user's telemetry opt-in decision.

Cross-session handoff

Not built-in at the toolkit level; depends on the underlying framework's memory adapter.

Context management

nat-workflow-creation skill instructs agents to run nat info components before editing YAML — this is a form of live context injection (query the system for valid component types before generating YAML).

Orchestration

NVIDIA NeMo Agent Toolkit — Orchestration

Multi-agent

Indirectly — through integrations with LangChain, CrewAI, AutoGen, Agno (all support multi-agent). The toolkit adds profiling and observability on top of those frameworks' multi-agent capabilities.

Orchestration pattern

Depends on underlying framework:

LangChain: sequential or parallel-fan-out (LangGraph)
CrewAI: hierarchical
AutoGen: swarm
Agno: hierarchical (Team modes)

Agent Performance Primitives (APP)

Accelerates graph-based frameworks with:

Parallel execution: run independent graph nodes concurrently
Speculative branching: start multiple branches, prune losers
Node-level priority routing: high-priority nodes get more compute

Max concurrent agents

Depends on underlying framework.

Isolation mechanism

None at the toolkit level. Depends on underlying framework.

Multi-model

Yes. workflow.yml supports multiple llms: block entries with different models. Each workflow component references a named LLM. NVIDIA NIM enables routing to different NIM endpoints.

Execution mode

one-shot (CLI invocation) and event-driven (FastMCP server mode).

NVIDIA Dynamo integration

Runtime inference optimization:

Latency sensitivity inference from agent profiles
Cache control hints
Load-aware routing
Priority-aware serving

Crash recovery

parse_agent_response_max_retries: N — retry failed LLM response parsing. Framework-level crash recovery depends on underlying framework.

Context compaction

Not built-in; depends on underlying framework.

Streaming

Yes — Chat UI provides streaming display.

Ui Cli Surface

NVIDIA NeMo Agent Toolkit — UI & CLI Surface

CLI binary

nat — entry point: nat.cli.main:run_cli.

Key subcommands:

nat run --config_file <file> --input <text> — execute a workflow
nat info components -t <type> — discover available component types
nat configure telemetry --enable|--disable|--status — manage telemetry
(inferred) nat eval — run evaluation
(inferred) nat optimize — run optimizer

Telemetry env override: NAT_TELEMETRY_ENABLED=false/true

Built-in Chat UI

From README:

"Built-In User Interface: Use the NeMo Agent Toolkit UI chat interface to interact with your agents, visualize output, and debug workflows."

Located in packages/nvidia_nat_core/src/nat/front_ends/. Technology stack: unknown (Python-based, likely Gradio or similar).

IDE integration (Claude Code skills)

10 Claude Code skills in skills/ directory — the most comprehensive suite of coding agent skills in this batch. Skills cover every aspect of toolkit usage from installation to fine-tuning.

MCP server mode

nvidia_nat_fastmcp — publish workflows as MCP servers via FastMCP. Enables consuming NeMo workflows from any MCP client (Claude Code, Cursor, etc.).

A2A support

nvidia_nat_a2a — Agent-to-Agent protocol support for distributed agent teams.

Observability

LangSmith native integration: full tracing, eval experiments, prompt versioning
OpenTelemetry: standard OTEL traces
Profiler: token-level profiling from workflow to individual tokens
NVIDIA Dynamo: runtime telemetry for inference optimization

Colab support

All examples runnable in Google Colab with zero setup (Colab badge in README).

Related frameworks

same archetype · same primary tool · same memory type

Claude-Flow / Ruflo ★ 55k

A6 Multi-agent orchestrator

Eliminates single-agent context limits and sequential bottlenecks by orchestrating fault-tolerant swarms of specialized AI agents…

Hermes Agent (NousResearch) ★ 168k

A6 Multi-agent orchestrator

Self-improving personal AI agent with closed learning loop, 7 terminal backends, and messaging gateway — not tied to any AI…

OpenCode ★ 165k

A6 Multi-agent orchestrator

Terminal-first AI coding agent with multi-model routing, native desktop app, and a typed .opencode/ configuration system for…

OpenHands ★ 75k

A6 Multi-agent orchestrator

Open-source AI software development platform (open-source Devin alternative) with Docker sandbox isolation, 77.6% SWE-bench…

DeerFlow ★ 70k

A6 Multi-agent orchestrator

Long-horizon superagent that researches, codes, and creates by orchestrating parallel sub-agents with isolated contexts in Docker…

oh-my-openagent (omo) ★ 60k

A6 Multi-agent orchestrator

Multi-provider AI agent orchestration for OpenCode: escape vendor lock-in by routing Sisyphus (Claude/Kimi/GLM) and Hephaestus…

Distribution

Type: pip-package
License: Apache-2.0
Install: one-liner

Surfaces

CLI binary: nat
CLI subcmds: 4
Local UI: web-dashboard
Tech stack: Python-based chat UI (likely Gradio)

Components

Commands: 0
Skills: 10
Subagents: 0
Hooks: 0
MCP servers: 1
Scripts: 5
Templates: 20

Workflow

Phases: 6
Approval gates: 0
Spec format: yaml
Spec storage: flat-files
Delta or full: whole-file

Orchestration

Multi-agent: No
Pattern: sequential
Max concurrent: 1
Isolation: none
Consensus: none
Prompt chaining: No

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text+vision

Execution

Mode: one-shot
Crash recovery: No
Compaction: No
Session handoff: No
Streaming: Yes

Memory

Type: none
Persistence: none
Search: none
State files: 1 file

Quality

TDD: No
TDD mechanism: none
Validators: 1
Self-review: adversarial-subagent

Git / Observability

Auto commit: No
Auto PR: No
Auto merge: No
Worktree/feat: No
Audit log: Yes
Audit format: jsonl
Replay: Yes

Tools

Primary: any
Targets: 8
Portability: high

Signals

Stars: 2.3k
Last commit: 2026-05-26
Maintainer: active
Quality score: 3.6/10

Summary

NVIDIA NeMo Agent Toolkit — Summary

Overview

NVIDIA NeMo Agent Toolkit — Overview

Origin

Philosophy

Design pillars

10 Claude Code skills

Key quote

Architecture

NVIDIA NeMo Agent Toolkit — Architecture

Distribution

Monorepo packages (selected from packages/)

CLI binary

Key configuration artifact

Skills directory

NVIDIA NIM integration

Components

NVIDIA NeMo Agent Toolkit — Components

CLI subcommands (nat)

Claude Code Skills (10 total)

Workflow component types (from nat info components)

Function types (tools)

Framework integrations (optional packages)

Agent Performance Primitives (APP)

Prompts

NVIDIA NeMo Agent Toolkit — Prompts

Verbatim: nat-workflow-creation SKILL.md

References

Uniqueness

NVIDIA NeMo Agent Toolkit — Uniqueness

Differs from seeds

Positioning

Distinctive features

Observable failure modes

Workflow

NVIDIA NeMo Agent Toolkit — Workflow

Basic execution workflow

Development workflow with skills

Evaluation workflow

Optimization workflow

Fine-tuning workflow

Approval gates

Telemetry consent

Memory Context

NVIDIA NeMo Agent Toolkit — Memory & Context

Memory architecture

Workflow state

ATIF trajectory format

LangSmith integration

Telemetry persistence

Cross-session handoff

Context management

Orchestration

NVIDIA NeMo Agent Toolkit — Orchestration

Multi-agent

Orchestration pattern

Agent Performance Primitives (APP)

Max concurrent agents

Isolation mechanism

Multi-model

Execution mode

NVIDIA Dynamo integration

Crash recovery

Context compaction

Streaming

Ui Cli Surface

NVIDIA NeMo Agent Toolkit — UI & CLI Surface

CLI binary

Built-in Chat UI

IDE integration (Claude Code skills)

MCP server mode

A2A support

Observability

Colab support

Related frameworks

Monorepo packages (selected from `packages/`)

CLI subcommands (`nat`)

Workflow component types (from `nat info components`)

Verbatim: `nat-workflow-creation` SKILL.md