Skip to content
/

TraceRoot AI

traceroot-ai · traceroot-ai/traceroot · ★ 591 · last commit 2026-05-26

Primitive shape 2 total
Hooks 1 MCP tools 1
00

Summary

TraceRoot AI — Summary

TraceRoot is an open-source observability and self-healing platform for AI agents (YC S25) that captures OpenTelemetry-compatible traces from LLM calls, agent actions, and tool usage, then uses AI to perform root-cause analysis connected to production source code and GitHub history. The platform ships a Python SDK (pip install traceroot), a TypeScript SDK, a FastAPI backend (Python/ClickHouse/PostgreSQL/Redis), a Next.js frontend dashboard, and a Celery worker for async trace processing. "Detectors" run LLM-as-judge evaluation on incoming traces to flag hallucinations, tool failures, logic errors, and safety violations automatically, triggering alerts via email/Slack. Agentic debugging connects to a sandbox with production source code, identifies the failing line, correlates with GitHub commits/PRs/issues, and can generate a fix PR. BYOK support allows any model provider (OpenAI, Anthropic, Gemini, xAI, DeepSeek, Kimi, GLM).

Differs from seeds: TraceRoot occupies a distinct quadrant from all 11 seeds — it is an observability and self-healing infrastructure layer for agent systems rather than a methodology to run agent sessions. No seed addresses post-hoc trace analysis, LLM-as-judge monitoring detectors, or cross-correlating failures with git history. The closest seed is ccmemory in that both persist cross-session agent state, but TraceRoot's purpose is operational monitoring and root-cause analysis rather than context compaction.

01

Overview

TraceRoot AI — Overview

Origin

TraceRoot (traceroot-ai/traceroot) is an open-source observability platform for AI agents from YC Batch S25, built under MIT license. It provides both a hosted cloud option (app.traceroot.ai) and full self-hosting capability. The codebase is a monorepo with Python FastAPI backend, Next.js frontend, Celery workers, and SDKs for Python and TypeScript. Version v0.2.0 released 2026-03-21; 591 stars, 30 contributors.

Philosophy

TraceRoot's conviction is that traces alone don't scale as AI agent systems grow complex. Manual trace inspection is unsustainable; automatic detection and AI-powered root cause analysis are required for production AI systems.

Three-pillar philosophy:

  1. Tracing — Capture everything via OpenTelemetry-compatible SDK with intelligent noise filtering
  2. Agentic Debugging — AI connects to a sandbox with your production source code, finds the exact failing line, correlates with GitHub history
  3. Detectors — LLM-as-judge monitors incoming traces automatically; triggers RCA and alerts

Manifesto-Style Quotes

"As AI agent systems grow more complex, manually sifting through every trace is unsustainable. TraceRoot's Detectors selectively screen incoming traces — flagging hallucinations, tool failures, logic errors, and safety issues automatically."

"Root-causing failures across agent hallucinations, tool call instabilities, and version changes is hard. TraceRoot's AI connects to a sandbox running your production source code, identifies the exact failing line, and cross-references your GitHub history."

"Both the observability platform and the AI debugging layer are open source. BYOK support for any model provider — no vendor lock-in."

Integration Coverage

15+ agent frameworks supported: LangChain/LangGraph, Claude Agent SDK, OpenAI Agents SDK, Mastra, Vercel AI SDK, AutoGen, LlamaIndex, CrewAI, Agno, DSPy, Google ADK, Pydantic AI, and more.

5 model provider integrations: OpenAI, Anthropic, Google Gemini, Mistral, plus generic OpenTelemetry.

02

Architecture

TraceRoot AI — Architecture

Distribution

  • Self-hosted: Docker Compose (dev/prod) or Kubernetes (Helm + Terraform on AWS, experimental)
  • Cloud: app.traceroot.ai (hosted, no credit card)
  • SDKs: Python (pip install traceroot) and TypeScript (traceroot-ts package)

Install (Developer Mode)

git clone https://github.com/traceroot-ai/traceroot.git
cd traceroot
make dev          # infra in Docker, app runs locally
# OR
make prod         # everything in Docker

Directory Tree

backend/
  rest/
    main.py                # FastAPI entry point
    routers/               # traces, sessions, live, users, internal, public
    services/              # business logic
    db/                    # ClickHouse + PostgreSQL connectors
    worker/                # Celery workers (async trace processing)
  shared/
frontend/
  ui/                      # Next.js app
  worker/                  # Frontend worker processes
ee/                        # Enterprise edition features
deploy/
  terraform/               # AWS Terraform (experimental)
  helm/                    # Helm chart (experimental)
scripts/
examples/
.claude/
  settings.json            # PostToolUse hook (lint.sh on Edit|Write)
  hooks/
    lint.sh
.mcp.json                  # Chrome DevTools MCP server config

Tech Stack

Layer Technology
Backend API Python FastAPI + uvicorn
Trace storage ClickHouse (time-series traces)
Relational DB PostgreSQL (metadata, users)
Cache/Queue Redis + Celery
SDK instrumentation OpenTelemetry protobuf
Frontend Next.js + React
Tokenization tiktoken
Deployment Docker Compose, Kubernetes/Helm

Required Runtime (self-hosted)

  • Docker + Docker Compose
  • Python >= 3.11 (dev mode)
  • Node.js (frontend dev mode)
  • PostgreSQL, Redis, ClickHouse (via Docker)

Target AI Tools

TraceRoot is observed rather than run inside; it instruments agent frameworks:

  • LangChain, LangGraph, CrewAI, AutoGen
  • Claude Agent SDK, OpenAI Agents SDK
  • Mastra, Vercel AI SDK, LlamaIndex
  • Any OpenTelemetry-compatible system

The .claude/settings.json in the repo configures Claude Code for local development (PostToolUse lint hook).

03

Components

TraceRoot AI — Components

Python SDK

traceroot package — instrumentation layer:

  • traceroot.initialize(integrations=[Integration.OPENAI]) — auto-instruments model calls
  • @observe(name="my_agent", type="agent") — manual span decorator
  • OpenTelemetry-compatible OTLP exporter

TypeScript SDK

traceroot-ts — Node.js instrumentation layer for same integrations.

Backend API (FastAPI Routers)

Router Purpose
traces.py Ingest and retrieve trace data
sessions.py Session management
live.py Real-time streaming
users.py Authentication
internal.py Internal API
public/ Public-facing endpoints

Detectors

LLM-as-judge evaluators that monitor incoming traces for:

  • Hallucinations — cross-reference agent claims against source
  • Tool/logic failures — detect incorrect tool call patterns
  • Safety violations — flag harmful or policy-violating outputs
  • Intent drift — detect when agent deviates from original goal

Triggers: automatic root cause analysis + email/Slack alerts

Agentic Debugging

AI debugging component that:

  1. Connects to a sandbox with production source code
  2. Identifies the exact failing line
  3. Correlates failure with GitHub commits, PRs, and open issues
  4. Can generate a fix PR

BYOK: any model provider (OpenAI, Anthropic, Gemini, xAI, DeepSeek, Kimi, GLM, OpenRouter)

Storage Components

  • ClickHouse: Time-series trace storage (optimized for OLAP)
  • PostgreSQL: Relational metadata (users, sessions, configs)
  • Redis: Task queue (Celery), pub/sub, caching

Claude Code Integration (for development)

.claude/settings.json — PostToolUse hook:

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{"type": "command", "command": ".claude/hooks/lint.sh", "timeout": 30}]
    }]
  }
}

This is a dev-environment configuration, not a user-facing product feature.

MCP Integration

.mcp.json — Chrome DevTools MCP server for UI debugging during development.

05

Prompts

TraceRoot AI — Prompts

Note: TraceRoot is an observability platform, not a prompt engineering framework. It does not ship prompt templates for coding agents to use. The "prompts" in this context are the Detector evaluation prompts used internally by TraceRoot's LLM-as-judge system, which are not publicly exposed in the GitHub repository. The following captures what is available.

Prompt 1: Python SDK Instrumentation Interface

Technique: Decorator-based instrumentation with type tagging

From README quickstart:

import traceroot
from traceroot import Integration, observe
from openai import OpenAI

traceroot.initialize(integrations=[Integration.OPENAI])
client = OpenAI()

@observe(name="my_agent", type="agent")
def my_agent(query: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": query}],
    )
    return response.choices[0].message.content

Analysis: The @observe decorator takes name and type parameters. The type="agent" tag enables TraceRoot to identify agent-level spans for health scoring and cost attribution. Auto-instrumentation via Integration.OPENAI patches the OpenAI client at the module level.

Prompt 2: Claude Code Development Hook

Technique: PostToolUse hook for automated lint enforcement

From .claude/settings.json:

{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1",
    "CLAUDE_CODE_FORK_SUBAGENT": "1"
  },
  "permissions": {
    "defaultMode": "bypassPermissions"
  },
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{"type": "command", "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/lint.sh", "timeout": 30}]
    }]
  },
  "enabledPlugins": {
    "superpowers@claude-plugins-official": true,
    "code-simplifier@claude-plugins-official": true
  }
}

Analysis: Enables experimental Claude Code agent teams feature and fork-subagent capability. bypassPermissions mode is used for autonomous development. The lint hook runs automatically after every Edit or Write operation. Superpowers and code-simplifier plugins are enabled for the dev team's use.

Prompt 3: Detector Configuration (Conceptual)

Technique: LLM-as-judge with 4 failure mode categories

TraceRoot Detectors evaluate traces against four categories (documented in README, implementation private):

  • Hallucinations — agent claims not grounded in evidence
  • Tool/logic failures — incorrect or failed tool calls
  • Safety violations — harmful outputs
  • Intent drift — deviation from original goal

The judge model is configurable (BYOK) and runs asynchronously after trace ingestion.

09

Uniqueness

TraceRoot AI — Uniqueness & Positioning

Differs From Seeds

TraceRoot occupies a different category from all 11 seeds. It is an observability and self-healing infrastructure layer for AI agent systems, not a methodology or harness to run agent sessions. No seed (spec-kit, openspec, superpowers, bmad-method, agent-os, claude-flow, taskmaster-ai, kiro, ccmemory, claude-conductor, spec-driver) addresses post-hoc trace analysis, LLM-as-judge monitoring, or root-cause analysis correlating failures with git history. The closest seed is ccmemory in that both persist cross-session agent state, but ccmemory persists context for the agent to use; TraceRoot persists operational telemetry about agents for humans to analyze.

Distinctive Opinion

"Traces alone don't scale. Manually sifting through every trace is unsustainable."

TraceRoot bets that the operational maturity of AI agent systems requires the same tooling as production software: continuous monitoring with automatic anomaly detection and AI-powered root cause analysis. The market comparison is Datadog/New Relic for LLM systems — but open-source and self-hostable.

The BYOK stance on the debugging AI model is a deliberate anti-lock-in position.

Positioning

  • Primary market: teams running multi-agent systems in production who cannot afford to manually review every session
  • Regulated industry play: self-hosted deployment means traces (which may contain sensitive data) never leave the customer's infrastructure
  • YC S25 backing signals VC-grade ambitions for the observability-for-AI space

Observable Failure Modes

  1. SDK instrumentation gap: TraceRoot only sees what is instrumented. Agents using un-integrated frameworks or custom tool implementations produce partial traces. Agentic debugging on partial traces may miss root causes.
  2. Detector false positives: LLM-as-judge detectors on high-volume trace streams may produce alert fatigue if precision is not tuned.
  3. Sandbox-source drift: The agentic debugger connects to production source code; if the sandbox version differs from the version that produced the failing trace, root cause analysis may be incorrect.
  4. ClickHouse storage cost: High-volume LLM traces (billions of tokens) accumulate rapidly. Storage costs at scale are a deployment concern.
  5. Early stage: v0.2.0 released 2026-03-21, Kubernetes deployment is experimental — production readiness gaps expected.

Cross-References

  • TraceRoot is complementary to all other frameworks in this batch: AgentOps, AgentTrace, Liza, and Optio all produce agent telemetry that TraceRoot could ingest
  • The Claude Code dev config (bypassPermissions, experimental agent teams) mirrors advanced usage patterns also seen in agentops-boshu and liza
04

Workflow

TraceRoot AI — Workflow

Production Monitoring Workflow

  1. Instrument: Add SDK (traceroot.initialize(integrations=[...])) to agent code
  2. Ingest: Agent runs send traces via OTLP to TraceRoot backend
  3. Process: Celery workers store traces in ClickHouse; metadata in PostgreSQL
  4. Detect: Detectors run LLM-as-judge evaluation on incoming traces (automatic)
  5. Alert: Email/Slack notifications on detected anomalies
  6. Investigate: Dashboard shows ranked sessions by cost, health, anomaly type
  7. Debug: Agentic debugger connects to sandbox, identifies failing line, correlates with GitHub
  8. Fix: Optional PR generation for detected failures

Developer Workflow (SDK Integration)

import traceroot
from traceroot import Integration, observe
from openai import OpenAI

traceroot.initialize(integrations=[Integration.OPENAI])
client = OpenAI()

@observe(name="my_agent", type="agent")
def my_agent(query: str) -> str:
    response = client.chat.completions.create(...)
    return response.choices[0].message.content

Zero config for auto-instrumented frameworks; @observe decorator for manual instrumentation.

Approval Gates

None — TraceRoot is passive observability infrastructure. The Detectors trigger alerts automatically; humans receive notifications and choose whether to act.

Phase-to-Artifact Map

Phase Artifact
Instrumentation SDK code added to agent
Trace ingestion OTLP spans → ClickHouse
Detection Detector verdict (hallucination/failure/safety/drift)
Alert Email/Slack notification
Investigation Dashboard session detail view
Debugging Root cause report + failing line ID
Fix GitHub PR (optional)

Deployment Workflow (Self-hosted)

make dev   # development mode
make prod  # production mode (all Docker)

Terraform + Helm for Kubernetes production deployment (experimental).

06

Memory Context

TraceRoot AI — Memory & Context

State Storage

TraceRoot persists extensive cross-session and cross-agent state:

Store Data Technology
ClickHouse All trace spans (OLAP time-series) ClickHouse
PostgreSQL Sessions, users, detector configs, metadata PostgreSQL
Redis Celery task queue, pub/sub, real-time streaming Redis

What is Stored

From trace ingestion:

  • LLM call spans: model, tokens (input/output/cache), cost, latency
  • Agent action spans: tool calls, tool results, errors
  • Session metadata: start/end time, total cost, health score
  • Tool failure events: which tool, error type, stack trace

Persistence

Cross-session by design — all traces are persisted indefinitely (until TTL or manual deletion). This enables:

  • Historical cost/usage analysis
  • Regression detection (baseline comparison)
  • Root cause analysis that correlates failures across multiple sessions

Cross-Agent Correlation

A key capability: TraceRoot correlates traces from multiple agent sessions to identify systemic failure patterns (e.g., all sessions using a specific prompt template failing at the same tool call). This cross-agent correlation is not possible in session-scoped tools.

Memory for Debugging

The Agentic Debugging component accesses:

  • Current trace spans
  • Production source code (via sandbox)
  • GitHub commit history (via GitHub integration)
  • Open issues and PRs

This gives the AI debugger durable cross-session memory of the codebase's evolution relative to failures.

Compaction

Not applicable — TraceRoot stores raw traces; compaction is a query/aggregation concern, not a context window concern. The platform is read by an observability UI, not fed into a context window.

SDK State

The Python/TypeScript SDKs maintain no local state beyond the current session's span buffer before flushing to the backend.

07

Orchestration

TraceRoot AI — Orchestration

Multi-Agent Pattern

TraceRoot is an observability target, not an orchestrator. It does not coordinate agents; it monitors them.

Internally, TraceRoot uses Celery for worker orchestration (trace ingestion pipeline), but this is infrastructure-level parallelism, not AI agent orchestration.

Multi-Agent Monitoring

TraceRoot can monitor multi-agent systems: all agents instrumented with the SDK send traces that TraceRoot correlates. This gives a cross-agent view of:

  • Which agent spent what
  • Where cross-agent handoffs failed
  • Health scores across the fleet

Execution Mode

Background daemon — TraceRoot runs continuously as a server-side service, receiving traces in real time from instrumented agents. The Celery workers run asynchronously.

Detector Execution

Detectors run event-driven: triggered when new traces arrive. Each detector is an LLM-as-judge call that evaluates the trace and produces a verdict. Alerts fire on WARN/FAIL verdicts.

Agentic Debugging Execution

On-demand — triggered when a user or detector escalation initiates a debug session. The AI connects to a sandbox with source code, runs a root cause analysis, and optionally generates a PR.

Isolation Mechanism

TraceRoot's backend is containerized (Docker Compose / Kubernetes). The agentic debugging sandbox is an isolated code execution environment separate from production.

Multi-Model (BYOK)

Yes — the agentic debugging and detector judge model are user-configurable (BYOK). Supported: OpenAI, Anthropic, Gemini, xAI, DeepSeek, Kimi, GLM, OpenRouter.

Claude Code Usage (Dev)

The repo's .claude/settings.json uses Claude Code with experimental agent teams and fork-subagent enabled for developing TraceRoot itself. Not a user-facing orchestration feature.

08

Ui Cli Surface

TraceRoot AI — UI / CLI Surface

CLI Binary

None. TraceRoot does not ship a CLI binary for end users. The traceroot Python package is an SDK, not a CLI.

Development lifecycle uses make dev and make prod (Makefile shortcuts for Docker Compose).

Local Web Dashboard

  • Exists: Yes
  • Type: Web dashboard
  • Port: Not documented in public README (default Docker Compose port)
  • Tech stack: Next.js + React (frontend/ui/)
  • Features:
    • Session list with cost, tokens, health score, anomaly flags
    • Session detail view with trace timeline
    • Detector configuration
    • Cost analytics and usage breakdown
    • Real-time log streaming (WebSocket)
    • Agentic debugging interface
    • GitHub integration for commit correlation

Cloud Dashboard

app.traceroot.ai — hosted version with same features, no self-hosting required.

SDK Surface

Python SDK (pip install traceroot):

traceroot.initialize(integrations=[Integration.OPENAI, Integration.ANTHROPIC])

@observe(name="span_name", type="agent|tool|llm")
def my_function():
    ...

TypeScript SDK (traceroot-ts): Same API surface in Node.js.

Integrations

Instrumentation integrations (zero-code in most cases):

  • OpenAI, Anthropic, Google Gemini, Mistral
  • LangChain, LangGraph, CrewAI, AutoGen
  • Claude Agent SDK, OpenAI Agents SDK
  • Mastra, Vercel AI SDK, LlamaIndex
  • DSPy, Google ADK, Pydantic AI

Alerting

  • Email notifications on Detector triggers
  • Slack notifications on Detector triggers

Deployment

# Development
make dev

# Production (all Docker)
make prod

Kubernetes: deploy/ directory with Helm chart and Terraform (experimental, AWS-focused).

Related frameworks

same archetype · same primary tool · same memory type

MemPalace ★ 53k

Verbatim local-first AI memory with 96.6% R@5 retrieval on LongMemEval using zero API calls — structured into a palace hierarchy…

Beads (Yegge) ★ 24k

Dolt-powered distributed graph issue tracker where AI agents track tasks with hierarchical IDs and dependency edges, claim work…

deepagents (LangChain) ★ 23k

Opinionated Python agent harness on top of LangGraph with sub-agents, filesystem, memory, and context compaction bundled in

agentmemory ★ 18k

Persistent, searchable memory for AI coding agents that captures every tool interaction, compresses it via LLM, and injects…

Open Multi-Agent ★ 6.3k

Give a natural-language goal to a coordinator agent and get a dynamically decomposed, parallelized task DAG executed by…

Basic Memory ★ 3.1k

Gives AI agents a persistent, human-readable knowledge graph of project decisions, observations, and relations stored as plain…