Skip to content
/

NeMo Agent Toolkit

nemo-agent-toolkit · NVIDIA/NeMo-Agent-Toolkit · ★ 2.3k · last commit 2026-05-26

Primitive shape 11 total
Skills 10 MCP tools 1
00

Summary

NVIDIA NeMo Agent Toolkit — Summary

NVIDIA NeMo Agent Toolkit (nvidia-nat, 2.3k stars, Apache-2.0) is a Python SDK and CLI (nat) for adding enterprise-grade instrumentation, observability, profiling, optimization, and fine-tuning to AI agents — positioned as a framework-agnostic "intelligence layer" that works alongside LangChain, LlamaIndex, CrewAI, Microsoft Semantic Kernel, Google ADK, and custom Python agents. The core workflow is YAML-defined: a workflow.yml specifies functions (tools), llms (model config), and a workflow (agent type like react_agent) — then nat run --config_file workflow.yml --input "..." executes it. The toolkit ships 10 Claude Code skills for agent-assisted development (workflow creation, installation, evaluation, optimization, MCP/serving, tools, telemetry, path checks, user rules, skill evolution), a nat CLI with telemetry management and component discovery commands, a built-in Chat UI for interaction, an extensive profiling/evaluation/optimization/fine-tuning pipeline, Agent Performance Primitives (APP) for LangChain/CrewAI/Agno acceleration, FastMCP server publishing, LangSmith native integration, and NVIDIA Dynamo runtime integration. The monorepo has 30+ optional package extras. Compared to seeds, NeMo Agent Toolkit most closely resembles the observability layer that none of the seeds provide — it is the only framework in this batch whose primary value is measuring, profiling, and improving agents rather than defining or running them.

01

Overview

NVIDIA NeMo Agent Toolkit — Overview

Origin

NVIDIA Corporation, maintained by NVIDIA's AI research team. v1.5.0 (migration guide available for v1.5.0+). Available on PyPI as nvidia-nat. The package is an evolution of NVIDIA's earlier NeMo microservices work, now focused on agent instrumentation.

Philosophy

From the README:

"NVIDIA NeMo Agent Toolkit adds intelligence to AI agents across any framework—enhancing speed, accuracy, and decision-making through enterprise-grade instrumentation, observability, and continuous learning."

"Adds intelligence to AI agents across any framework."

The toolkit does not want to be your agent framework. It wants to be the observability and optimization layer on top of your existing agent framework. This is a fundamentally different position from every other framework in this batch.

Design pillars

  1. Framework agnostic: works alongside LangChain, LlamaIndex, CrewAI, Semantic Kernel, ADK, custom Python
  2. YAML-first workflow definition: workflow.ymlnat run — no code for simple agents
  3. Profiling: token-level execution profiling to find bottlenecks
  4. Evaluation: offline eval harness with LangSmith native integration
  5. Optimization: hyper-parameter + prompt optimizer
  6. Fine-tuning: RL-based fine-tuning of LLMs for specific agent behaviors
  7. Agent Performance Primitives (APP): parallel execution, speculative branching, node-level priority routing for graph-based frameworks
  8. NVIDIA Dynamo: runtime inference hints (cache control, load-aware routing, priority-aware serving)

10 Claude Code skills

The toolkit ships 10 skills for agent-assisted development of NeMo workflows — one of the largest skills collections in this batch.

Key quote

"With NeMo Agent Toolkit, you can move quickly, experiment freely, and ensure reliability across all your agent-driven projects."

02

Architecture

NVIDIA NeMo Agent Toolkit — Architecture

Distribution

  • Type: pip-package
  • PyPI name: nvidia-nat
  • Version: analyzed from develop branch (v1.5.x range)
  • Install: pip install nvidia-nat
  • Required runtime: Python 3.11–3.13
  • License: Apache-2.0

Monorepo packages (selected from packages/)

Package Purpose
nvidia_nat_core Core framework, CLI (nat)
nvidia_nat_langchain LangChain/LangGraph integration
nvidia_nat_crewai CrewAI integration
nvidia_nat_agno Agno integration
nvidia_nat_adk Google ADK integration
nvidia_nat_autogen AutoGen integration
nvidia_nat_eval Evaluation harness
nvidia_nat_fastmcp FastMCP server publishing
nvidia_nat_mcp MCP client integration
nvidia_nat_app Agent Performance Primitives
nvidia_nat_a2a A2A protocol
nvidia_nat_config_optimizer Hyper-parameter optimizer
nvidia_nat_data_flywheel Data flywheel for fine-tuning
nvidia_nat_langsmith LangSmith native integration

CLI binary

nat — entry point: nat.cli.main:run_cli.

Key configuration artifact

workflow.yml:

functions:
  wikipedia_search:
    _type: wiki_search
    max_results: 2

llms:
  nim_llm:
    _type: nim
    model_name: nvidia/nemotron-3-nano-30b-a3b
    temperature: 0.0

workflow:
  _type: react_agent
  tool_names: [wikipedia_search]
  llm_name: nim_llm
  verbose: true
  parse_agent_response_max_retries: 3

Skills directory

skills/
  nat-agent-configuration/
  nat-evaluation/
  nat-installation/
  nat-mcp-and-serving/
  nat-optimization/
  nat-path-checks/
  nat-telemetry/
  nat-tools-and-functions/
  nat-user-rules/
  nat-workflow-creation/
  skill-evolution/

NVIDIA NIM integration

LLMs configured with _type: nim use NVIDIA's NIM (NVIDIA Inference Microservices) for hosted model inference on NVIDIA GPUs.

03

Components

NVIDIA NeMo Agent Toolkit — Components

CLI subcommands (nat)

Command Purpose
nat run Run a workflow from config file
nat info components Discover available component types
nat configure telemetry Enable/disable usage telemetry
(inferred) nat eval Run evaluation harness
(inferred) nat optimize Run hyper-parameter optimizer

Claude Code Skills (10 total)

Skill name Purpose
nat-workflow-creation Create, edit, validate, run workflow.yml; nat CLI commands
nat-agent-configuration Configure agent settings and parameters
nat-evaluation Run evaluation experiments, compare outcomes
nat-installation Install the toolkit and dependencies
nat-mcp-and-serving Publish workflows as MCP servers via FastMCP
nat-optimization Hyper-parameter and prompt optimization
nat-path-checks Verify file paths and configurations
nat-telemetry Configure telemetry opt-in/out
nat-tools-and-functions Define and configure tools/functions
nat-user-rules Configure user-specific rules
skill-evolution Manage skill updates

Workflow component types (from nat info components)

  • react_agent — ReAct reasoning + acting
  • plan_and_execute_agent — plan then execute
  • llm_chain — simple LLM chain
  • Custom workflow types via plugins

Function types (tools)

  • wiki_search — Wikipedia search
  • web_search — Web search
  • MCP tools (via nvidia_nat_mcp)
  • Custom Python functions

Framework integrations (optional packages)

LangChain, LlamaIndex, CrewAI, Microsoft Semantic Kernel, Google ADK, AutoGen, Agno

Agent Performance Primitives (APP)

Accelerates graph-based agent frameworks with:

  • Parallel execution
  • Speculative branching
  • Node-level priority routing
05

Prompts

NVIDIA NeMo Agent Toolkit — Prompts

Verbatim: nat-workflow-creation SKILL.md

---
name: nat-workflow-creation
description: Use when creating, editing, validating, running, or troubleshooting 
NeMo Agent Toolkit workflow YAML, component discovery, LLM configuration, 
and common `nat` CLI commands.
author: NVIDIA Corporation and Affiliates
license: Apache-2.0
---

# NeMo Agent Toolkit Workflow Creation

Use this skill for workflow YAML and command-line execution.

## Workflow

1. Run component discovery before editing `_type` values:

```bash
uv run nat info components -t function
uv run nat info components -t llm_provider
  1. Read the reference that matches the task.
  2. Keep YAML examples runnable from the repository root.
  3. Validate with the smallest useful command, usually:
uv run nat run --config_file path/to/workflow.yml --input "Test request"

References

  • references/workflow-creation.md
  • references/cli-reference.md
  • references/llm-config.md

**Technique**: Skill-as-workflow-reference. The skill prescribes a specific validation command (`nat run`) rather than asking the agent to figure it out. The "run component discovery before editing" instruction prevents hallucinated `_type` values — a common failure mode in YAML-heavy frameworks.

## Verbatim: workflow.yml example

```yaml
functions:
  wikipedia_search:
    _type: wiki_search
    max_results: 2

llms:
  nim_llm:
    _type: nim
    model_name: nvidia/nemotron-3-nano-30b-a3b
    temperature: 0.0
    chat_template_kwargs:
      enable_thinking: false

workflow:
  _type: react_agent
  tool_names: [wikipedia_search]
  llm_name: nim_llm
  verbose: true
  parse_agent_response_max_retries: 3

Technique: Declarative YAML as agent configuration. The _type keys drive component discovery; parse_agent_response_max_retries: 3 is explicit retry logic. enable_thinking: false shows model-level thinking mode control.

09

Uniqueness

NVIDIA NeMo Agent Toolkit — Uniqueness

Differs from seeds

No seed provides NeMo's primary value. The closest seed is spec-kit (both care about quality validation), but spec-kit validates specification documents against code while NeMo Agent Toolkit validates, profiles, and optimizes agent execution — a completely different layer. Unlike every other framework in this batch (which defines how agents are built or run), NeMo Agent Toolkit's primary purpose is measuring and improving agents that already exist. The 10 Claude Code skills are the second-largest skills collection in the batch (after spec-driver's 24 skills) and the only set explicitly authored by an enterprise AI hardware company.

Positioning

NeMo Agent Toolkit is the enterprise observability and optimization toolkit for AI agents. It targets AI engineering teams at large organizations who have existing agents (built with LangChain, CrewAI, etc.) and want to measure performance, run evals, optimize prompts, and fine-tune models without migrating to a new framework.

Distinctive features

  1. Token-level profiling: profile from workflow to individual tokens — unique in the catalog
  2. RL fine-tuning pipeline: train LLMs specifically for agent behavior using ATIF trajectory data — unique in the catalog
  3. NVIDIA Dynamo integration: runtime inference hints (cache control, load-aware routing) — unique in the catalog
  4. 10 Claude Code skills: largest enterprise-authored coding agent skills collection in the batch
  5. Agent Performance Primitives: parallel execution, speculative branching, priority routing for LangChain/CrewAI/Agno
  6. NVIDIA NIM: first-class integration with NVIDIA's hosted inference microservices

Observable failure modes

  1. Framework layer, not agent layer: cannot build agents with NeMo alone; requires another framework underneath
  2. NVIDIA ecosystem coupling: NIM, Dynamo, and fine-tuning deeply tie to NVIDIA's infrastructure
  3. YAML _type fragility: undiscoverable component types lead to silent failures; the skill's "run component discovery first" instruction is a workaround for this
  4. Develop branch only: default branch is develop, not main — implies some instability
  5. License/telemetry concerns: NVIDIA telemetry requires explicit opt-out for CI environments
04

Workflow

NVIDIA NeMo Agent Toolkit — Workflow

Basic execution workflow

# 1. Create workflow.yml
# 2. Run
nat run --config_file workflow.yml --input "List five subspecies of Aardvarks"
Phase Artifact
1. Write workflow.yml YAML config (functions + llm + workflow)
2. nat run Workflow execution
3. Workflow result Console output

Development workflow with skills

# 1. Discover components
uv run nat info components -t function
uv run nat info components -t llm_provider

# 2. Create workflow
# (AI coding agent uses nat-workflow-creation skill)

# 3. Validate
uv run nat run --config_file workflow.yml --input "Test request"

Evaluation workflow

Phase Artifact
1. Define eval dataset Evaluation set
2. Run eval harness LangSmith experiment or local eval
3. Compare runs Performance comparison report
4. Iterate Updated workflow.yml

Optimization workflow

Phase Artifact
1. Define parameter space Config variations
2. Run optimizer Sweep experiments
3. Best config identified Optimized workflow.yml

Fine-tuning workflow

Phase Artifact
1. Collect trajectories (ATIF format) Training data
2. RL fine-tuning via NVIDIA NIM Fine-tuned model
3. Deploy fine-tuned model Updated llm config

Approval gates

None built-in. The toolkit defers to the underlying agent framework for human-in-the-loop.

First-run interactive prompt for telemetry consent. Persisted to ~/.config/nat/telemetry.toml.

06

Memory Context

NVIDIA NeMo Agent Toolkit — Memory & Context

Memory architecture

NeMo Agent Toolkit defers memory management to the underlying agent framework (LangChain, CrewAI, etc.). The toolkit's core value is in instrumentation, not memory.

Optional integration:

  • nvidia_nat_mem0ai — mem0.ai integration for persistent memory
  • nvidia_nat_memmachine — MemMachine integration

Workflow state

workflow.yml parameters are stateless configuration — no per-run state is persisted to files by default.

ATIF trajectory format

For fine-tuning, agent trajectories are captured in ATIF (Agent Trajectory Interchange Format) — a structured format for storing agent run data for RL training. This is effectively a write-once audit log.

LangSmith integration

Run history, eval experiments, and prompt versions are stored in LangSmith. This is the primary "memory" for evaluation and improvement workflows.

Telemetry persistence

~/.config/nat/telemetry.toml — stores the user's telemetry opt-in decision.

Cross-session handoff

Not built-in at the toolkit level; depends on the underlying framework's memory adapter.

Context management

nat-workflow-creation skill instructs agents to run nat info components before editing YAML — this is a form of live context injection (query the system for valid component types before generating YAML).

07

Orchestration

NVIDIA NeMo Agent Toolkit — Orchestration

Multi-agent

Indirectly — through integrations with LangChain, CrewAI, AutoGen, Agno (all support multi-agent). The toolkit adds profiling and observability on top of those frameworks' multi-agent capabilities.

Orchestration pattern

Depends on underlying framework:

  • LangChain: sequential or parallel-fan-out (LangGraph)
  • CrewAI: hierarchical
  • AutoGen: swarm
  • Agno: hierarchical (Team modes)

Agent Performance Primitives (APP)

Accelerates graph-based frameworks with:

  • Parallel execution: run independent graph nodes concurrently
  • Speculative branching: start multiple branches, prune losers
  • Node-level priority routing: high-priority nodes get more compute

Max concurrent agents

Depends on underlying framework.

Isolation mechanism

None at the toolkit level. Depends on underlying framework.

Multi-model

Yes. workflow.yml supports multiple llms: block entries with different models. Each workflow component references a named LLM. NVIDIA NIM enables routing to different NIM endpoints.

Execution mode

one-shot (CLI invocation) and event-driven (FastMCP server mode).

NVIDIA Dynamo integration

Runtime inference optimization:

  • Latency sensitivity inference from agent profiles
  • Cache control hints
  • Load-aware routing
  • Priority-aware serving

Crash recovery

parse_agent_response_max_retries: N — retry failed LLM response parsing. Framework-level crash recovery depends on underlying framework.

Context compaction

Not built-in; depends on underlying framework.

Streaming

Yes — Chat UI provides streaming display.

08

Ui Cli Surface

NVIDIA NeMo Agent Toolkit — UI & CLI Surface

CLI binary

nat — entry point: nat.cli.main:run_cli.

Key subcommands:

  • nat run --config_file <file> --input <text> — execute a workflow
  • nat info components -t <type> — discover available component types
  • nat configure telemetry --enable|--disable|--status — manage telemetry
  • (inferred) nat eval — run evaluation
  • (inferred) nat optimize — run optimizer

Telemetry env override: NAT_TELEMETRY_ENABLED=false/true

Built-in Chat UI

From README:

"Built-In User Interface: Use the NeMo Agent Toolkit UI chat interface to interact with your agents, visualize output, and debug workflows."

Located in packages/nvidia_nat_core/src/nat/front_ends/. Technology stack: unknown (Python-based, likely Gradio or similar).

IDE integration (Claude Code skills)

10 Claude Code skills in skills/ directory — the most comprehensive suite of coding agent skills in this batch. Skills cover every aspect of toolkit usage from installation to fine-tuning.

MCP server mode

nvidia_nat_fastmcp — publish workflows as MCP servers via FastMCP. Enables consuming NeMo workflows from any MCP client (Claude Code, Cursor, etc.).

A2A support

nvidia_nat_a2a — Agent-to-Agent protocol support for distributed agent teams.

Observability

  • LangSmith native integration: full tracing, eval experiments, prompt versioning
  • OpenTelemetry: standard OTEL traces
  • Profiler: token-level profiling from workflow to individual tokens
  • NVIDIA Dynamo: runtime telemetry for inference optimization

Colab support

All examples runnable in Google Colab with zero setup (Colab badge in README).

Related frameworks

same archetype · same primary tool · same memory type

Claude-Flow / Ruflo ★ 55k

Eliminates single-agent context limits and sequential bottlenecks by orchestrating fault-tolerant swarms of specialized AI agents…

Hermes Agent (NousResearch) ★ 168k

Self-improving personal AI agent with closed learning loop, 7 terminal backends, and messaging gateway — not tied to any AI…

OpenCode ★ 165k

Terminal-first AI coding agent with multi-model routing, native desktop app, and a typed .opencode/ configuration system for…

OpenHands ★ 75k

Open-source AI software development platform (open-source Devin alternative) with Docker sandbox isolation, 77.6% SWE-bench…

DeerFlow ★ 70k

Long-horizon superagent that researches, codes, and creates by orchestrating parallel sub-agents with isolated contexts in Docker…

oh-my-openagent (omo) ★ 60k

Multi-provider AI agent orchestration for OpenCode: escape vendor lock-in by routing Sisyphus (Claude/Kimi/GLM) and Hephaestus…