TraceRoot AI

traceroot-ai · traceroot-ai/traceroot · ★ 591 · last commit 2026-05-26

Primitive shape 2 total

Hooks 1 MCP tools 1

Summary

TraceRoot AI — Summary

TraceRoot is an open-source observability and self-healing platform for AI agents (YC S25) that captures OpenTelemetry-compatible traces from LLM calls, agent actions, and tool usage, then uses AI to perform root-cause analysis connected to production source code and GitHub history. The platform ships a Python SDK (pip install traceroot), a TypeScript SDK, a FastAPI backend (Python/ClickHouse/PostgreSQL/Redis), a Next.js frontend dashboard, and a Celery worker for async trace processing. "Detectors" run LLM-as-judge evaluation on incoming traces to flag hallucinations, tool failures, logic errors, and safety violations automatically, triggering alerts via email/Slack. Agentic debugging connects to a sandbox with production source code, identifies the failing line, correlates with GitHub commits/PRs/issues, and can generate a fix PR. BYOK support allows any model provider (OpenAI, Anthropic, Gemini, xAI, DeepSeek, Kimi, GLM).

Differs from seeds: TraceRoot occupies a distinct quadrant from all 11 seeds — it is an observability and self-healing infrastructure layer for agent systems rather than a methodology to run agent sessions. No seed addresses post-hoc trace analysis, LLM-as-judge monitoring detectors, or cross-correlating failures with git history. The closest seed is ccmemory in that both persist cross-session agent state, but TraceRoot's purpose is operational monitoring and root-cause analysis rather than context compaction.

Overview

TraceRoot AI — Overview

Origin

TraceRoot (traceroot-ai/traceroot) is an open-source observability platform for AI agents from YC Batch S25, built under MIT license. It provides both a hosted cloud option (app.traceroot.ai) and full self-hosting capability. The codebase is a monorepo with Python FastAPI backend, Next.js frontend, Celery workers, and SDKs for Python and TypeScript. Version v0.2.0 released 2026-03-21; 591 stars, 30 contributors.

Philosophy

TraceRoot's conviction is that traces alone don't scale as AI agent systems grow complex. Manual trace inspection is unsustainable; automatic detection and AI-powered root cause analysis are required for production AI systems.

Three-pillar philosophy:

Tracing — Capture everything via OpenTelemetry-compatible SDK with intelligent noise filtering
Agentic Debugging — AI connects to a sandbox with your production source code, finds the exact failing line, correlates with GitHub history
Detectors — LLM-as-judge monitors incoming traces automatically; triggers RCA and alerts

Manifesto-Style Quotes

"As AI agent systems grow more complex, manually sifting through every trace is unsustainable. TraceRoot's Detectors selectively screen incoming traces — flagging hallucinations, tool failures, logic errors, and safety issues automatically."

"Root-causing failures across agent hallucinations, tool call instabilities, and version changes is hard. TraceRoot's AI connects to a sandbox running your production source code, identifies the exact failing line, and cross-references your GitHub history."

"Both the observability platform and the AI debugging layer are open source. BYOK support for any model provider — no vendor lock-in."

Integration Coverage

15+ agent frameworks supported: LangChain/LangGraph, Claude Agent SDK, OpenAI Agents SDK, Mastra, Vercel AI SDK, AutoGen, LlamaIndex, CrewAI, Agno, DSPy, Google ADK, Pydantic AI, and more.

5 model provider integrations: OpenAI, Anthropic, Google Gemini, Mistral, plus generic OpenTelemetry.

Architecture

TraceRoot AI — Architecture

Distribution

Self-hosted: Docker Compose (dev/prod) or Kubernetes (Helm + Terraform on AWS, experimental)
Cloud: app.traceroot.ai (hosted, no credit card)
SDKs: Python (pip install traceroot) and TypeScript (traceroot-ts package)

Install (Developer Mode)

git clone https://github.com/traceroot-ai/traceroot.git
cd traceroot
make dev          # infra in Docker, app runs locally
# OR
make prod         # everything in Docker

Directory Tree

backend/
  rest/
    main.py                # FastAPI entry point
    routers/               # traces, sessions, live, users, internal, public
    services/              # business logic
    db/                    # ClickHouse + PostgreSQL connectors
    worker/                # Celery workers (async trace processing)
  shared/
frontend/
  ui/                      # Next.js app
  worker/                  # Frontend worker processes
ee/                        # Enterprise edition features
deploy/
  terraform/               # AWS Terraform (experimental)
  helm/                    # Helm chart (experimental)
scripts/
examples/
.claude/
  settings.json            # PostToolUse hook (lint.sh on Edit|Write)
  hooks/
    lint.sh
.mcp.json                  # Chrome DevTools MCP server config

Tech Stack

Layer	Technology
Backend API	Python FastAPI + uvicorn
Trace storage	ClickHouse (time-series traces)
Relational DB	PostgreSQL (metadata, users)
Cache/Queue	Redis + Celery
SDK instrumentation	OpenTelemetry protobuf
Frontend	Next.js + React
Tokenization	tiktoken
Deployment	Docker Compose, Kubernetes/Helm

Required Runtime (self-hosted)

Docker + Docker Compose
Python >= 3.11 (dev mode)
Node.js (frontend dev mode)
PostgreSQL, Redis, ClickHouse (via Docker)

Target AI Tools

TraceRoot is observed rather than run inside; it instruments agent frameworks:

LangChain, LangGraph, CrewAI, AutoGen
Claude Agent SDK, OpenAI Agents SDK
Mastra, Vercel AI SDK, LlamaIndex
Any OpenTelemetry-compatible system

The .claude/settings.json in the repo configures Claude Code for local development (PostToolUse lint hook).

Components

TraceRoot AI — Components

Python SDK

traceroot package — instrumentation layer:

traceroot.initialize(integrations=[Integration.OPENAI]) — auto-instruments model calls
@observe(name="my_agent", type="agent") — manual span decorator
OpenTelemetry-compatible OTLP exporter

TypeScript SDK

traceroot-ts — Node.js instrumentation layer for same integrations.

Backend API (FastAPI Routers)

Router	Purpose
`traces.py`	Ingest and retrieve trace data
`sessions.py`	Session management
`live.py`	Real-time streaming
`users.py`	Authentication
`internal.py`	Internal API
`public/`	Public-facing endpoints

Detectors

LLM-as-judge evaluators that monitor incoming traces for:

Hallucinations — cross-reference agent claims against source
Tool/logic failures — detect incorrect tool call patterns
Safety violations — flag harmful or policy-violating outputs
Intent drift — detect when agent deviates from original goal

Triggers: automatic root cause analysis + email/Slack alerts

Agentic Debugging

AI debugging component that:

Connects to a sandbox with production source code
Identifies the exact failing line
Correlates failure with GitHub commits, PRs, and open issues
Can generate a fix PR

BYOK: any model provider (OpenAI, Anthropic, Gemini, xAI, DeepSeek, Kimi, GLM, OpenRouter)

Storage Components

ClickHouse: Time-series trace storage (optimized for OLAP)
PostgreSQL: Relational metadata (users, sessions, configs)
Redis: Task queue (Celery), pub/sub, caching

Claude Code Integration (for development)

.claude/settings.json — PostToolUse hook:

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{"type": "command", "command": ".claude/hooks/lint.sh", "timeout": 30}]
    }]
  }
}

This is a dev-environment configuration, not a user-facing product feature.

MCP Integration

.mcp.json — Chrome DevTools MCP server for UI debugging during development.

Prompts

TraceRoot AI — Prompts

Note: TraceRoot is an observability platform, not a prompt engineering framework. It does not ship prompt templates for coding agents to use. The "prompts" in this context are the Detector evaluation prompts used internally by TraceRoot's LLM-as-judge system, which are not publicly exposed in the GitHub repository. The following captures what is available.

Prompt 1: Python SDK Instrumentation Interface

Technique: Decorator-based instrumentation with type tagging

From README quickstart:

import traceroot
from traceroot import Integration, observe
from openai import OpenAI

traceroot.initialize(integrations=[Integration.OPENAI])
client = OpenAI()

@observe(name="my_agent", type="agent")
def my_agent(query: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": query}],
    )
    return response.choices[0].message.content

Analysis: The @observe decorator takes name and type parameters. The type="agent" tag enables TraceRoot to identify agent-level spans for health scoring and cost attribution. Auto-instrumentation via Integration.OPENAI patches the OpenAI client at the module level.

Prompt 2: Claude Code Development Hook

Technique: PostToolUse hook for automated lint enforcement

From .claude/settings.json:

{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1",
    "CLAUDE_CODE_FORK_SUBAGENT": "1"
  },
  "permissions": {
    "defaultMode": "bypassPermissions"
  },
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write",
      "hooks": [{"type": "command", "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/lint.sh", "timeout": 30}]
    }]
  },
  "enabledPlugins": {
    "superpowers@claude-plugins-official": true,
    "code-simplifier@claude-plugins-official": true
  }
}

Analysis: Enables experimental Claude Code agent teams feature and fork-subagent capability. bypassPermissions mode is used for autonomous development. The lint hook runs automatically after every Edit or Write operation. Superpowers and code-simplifier plugins are enabled for the dev team's use.

Prompt 3: Detector Configuration (Conceptual)

Technique: LLM-as-judge with 4 failure mode categories

TraceRoot Detectors evaluate traces against four categories (documented in README, implementation private):

Hallucinations — agent claims not grounded in evidence
Tool/logic failures — incorrect or failed tool calls
Safety violations — harmful outputs
Intent drift — deviation from original goal

The judge model is configurable (BYOK) and runs asynchronously after trace ingestion.

Uniqueness

TraceRoot AI — Uniqueness & Positioning

Differs From Seeds

TraceRoot occupies a different category from all 11 seeds. It is an observability and self-healing infrastructure layer for AI agent systems, not a methodology or harness to run agent sessions. No seed (spec-kit, openspec, superpowers, bmad-method, agent-os, claude-flow, taskmaster-ai, kiro, ccmemory, claude-conductor, spec-driver) addresses post-hoc trace analysis, LLM-as-judge monitoring, or root-cause analysis correlating failures with git history. The closest seed is ccmemory in that both persist cross-session agent state, but ccmemory persists context for the agent to use; TraceRoot persists operational telemetry about agents for humans to analyze.

Distinctive Opinion

"Traces alone don't scale. Manually sifting through every trace is unsustainable."

TraceRoot bets that the operational maturity of AI agent systems requires the same tooling as production software: continuous monitoring with automatic anomaly detection and AI-powered root cause analysis. The market comparison is Datadog/New Relic for LLM systems — but open-source and self-hostable.

The BYOK stance on the debugging AI model is a deliberate anti-lock-in position.

Positioning

Primary market: teams running multi-agent systems in production who cannot afford to manually review every session
Regulated industry play: self-hosted deployment means traces (which may contain sensitive data) never leave the customer's infrastructure
YC S25 backing signals VC-grade ambitions for the observability-for-AI space

Observable Failure Modes

SDK instrumentation gap: TraceRoot only sees what is instrumented. Agents using un-integrated frameworks or custom tool implementations produce partial traces. Agentic debugging on partial traces may miss root causes.
Detector false positives: LLM-as-judge detectors on high-volume trace streams may produce alert fatigue if precision is not tuned.
Sandbox-source drift: The agentic debugger connects to production source code; if the sandbox version differs from the version that produced the failing trace, root cause analysis may be incorrect.
ClickHouse storage cost: High-volume LLM traces (billions of tokens) accumulate rapidly. Storage costs at scale are a deployment concern.
Early stage: v0.2.0 released 2026-03-21, Kubernetes deployment is experimental — production readiness gaps expected.

Cross-References

TraceRoot is complementary to all other frameworks in this batch: AgentOps, AgentTrace, Liza, and Optio all produce agent telemetry that TraceRoot could ingest
The Claude Code dev config (bypassPermissions, experimental agent teams) mirrors advanced usage patterns also seen in agentops-boshu and liza

Workflow

TraceRoot AI — Workflow

Production Monitoring Workflow

Instrument: Add SDK (traceroot.initialize(integrations=[...])) to agent code
Ingest: Agent runs send traces via OTLP to TraceRoot backend
Process: Celery workers store traces in ClickHouse; metadata in PostgreSQL
Detect: Detectors run LLM-as-judge evaluation on incoming traces (automatic)
Alert: Email/Slack notifications on detected anomalies
Investigate: Dashboard shows ranked sessions by cost, health, anomaly type
Debug: Agentic debugger connects to sandbox, identifies failing line, correlates with GitHub
Fix: Optional PR generation for detected failures

Developer Workflow (SDK Integration)

import traceroot
from traceroot import Integration, observe
from openai import OpenAI

traceroot.initialize(integrations=[Integration.OPENAI])
client = OpenAI()

@observe(name="my_agent", type="agent")
def my_agent(query: str) -> str:
    response = client.chat.completions.create(...)
    return response.choices[0].message.content

Zero config for auto-instrumented frameworks; @observe decorator for manual instrumentation.

Approval Gates

None — TraceRoot is passive observability infrastructure. The Detectors trigger alerts automatically; humans receive notifications and choose whether to act.

Phase-to-Artifact Map

Phase	Artifact
Instrumentation	SDK code added to agent
Trace ingestion	OTLP spans → ClickHouse
Detection	Detector verdict (hallucination/failure/safety/drift)
Alert	Email/Slack notification
Investigation	Dashboard session detail view
Debugging	Root cause report + failing line ID
Fix	GitHub PR (optional)

Deployment Workflow (Self-hosted)

make dev   # development mode
make prod  # production mode (all Docker)

Terraform + Helm for Kubernetes production deployment (experimental).

Memory Context

TraceRoot AI — Memory & Context

State Storage

TraceRoot persists extensive cross-session and cross-agent state:

Store	Data	Technology
ClickHouse	All trace spans (OLAP time-series)	ClickHouse
PostgreSQL	Sessions, users, detector configs, metadata	PostgreSQL
Redis	Celery task queue, pub/sub, real-time streaming	Redis

What is Stored

From trace ingestion:

LLM call spans: model, tokens (input/output/cache), cost, latency
Agent action spans: tool calls, tool results, errors
Session metadata: start/end time, total cost, health score
Tool failure events: which tool, error type, stack trace

Persistence

Cross-session by design — all traces are persisted indefinitely (until TTL or manual deletion). This enables:

Historical cost/usage analysis
Regression detection (baseline comparison)
Root cause analysis that correlates failures across multiple sessions

Cross-Agent Correlation

A key capability: TraceRoot correlates traces from multiple agent sessions to identify systemic failure patterns (e.g., all sessions using a specific prompt template failing at the same tool call). This cross-agent correlation is not possible in session-scoped tools.

Memory for Debugging

The Agentic Debugging component accesses:

Current trace spans
Production source code (via sandbox)
GitHub commit history (via GitHub integration)
Open issues and PRs

This gives the AI debugger durable cross-session memory of the codebase's evolution relative to failures.

Compaction

Not applicable — TraceRoot stores raw traces; compaction is a query/aggregation concern, not a context window concern. The platform is read by an observability UI, not fed into a context window.

SDK State

The Python/TypeScript SDKs maintain no local state beyond the current session's span buffer before flushing to the backend.

Orchestration

TraceRoot AI — Orchestration

Multi-Agent Pattern

TraceRoot is an observability target, not an orchestrator. It does not coordinate agents; it monitors them.

Internally, TraceRoot uses Celery for worker orchestration (trace ingestion pipeline), but this is infrastructure-level parallelism, not AI agent orchestration.

Multi-Agent Monitoring

TraceRoot can monitor multi-agent systems: all agents instrumented with the SDK send traces that TraceRoot correlates. This gives a cross-agent view of:

Which agent spent what
Where cross-agent handoffs failed
Health scores across the fleet

Execution Mode

Background daemon — TraceRoot runs continuously as a server-side service, receiving traces in real time from instrumented agents. The Celery workers run asynchronously.

Detector Execution

Detectors run event-driven: triggered when new traces arrive. Each detector is an LLM-as-judge call that evaluates the trace and produces a verdict. Alerts fire on WARN/FAIL verdicts.

Agentic Debugging Execution

On-demand — triggered when a user or detector escalation initiates a debug session. The AI connects to a sandbox with source code, runs a root cause analysis, and optionally generates a PR.

Isolation Mechanism

TraceRoot's backend is containerized (Docker Compose / Kubernetes). The agentic debugging sandbox is an isolated code execution environment separate from production.

Multi-Model (BYOK)

Yes — the agentic debugging and detector judge model are user-configurable (BYOK). Supported: OpenAI, Anthropic, Gemini, xAI, DeepSeek, Kimi, GLM, OpenRouter.

Claude Code Usage (Dev)

The repo's .claude/settings.json uses Claude Code with experimental agent teams and fork-subagent enabled for developing TraceRoot itself. Not a user-facing orchestration feature.

Ui Cli Surface

TraceRoot AI — UI / CLI Surface

CLI Binary

None. TraceRoot does not ship a CLI binary for end users. The traceroot Python package is an SDK, not a CLI.

Development lifecycle uses make dev and make prod (Makefile shortcuts for Docker Compose).

Local Web Dashboard

Exists: Yes
Type: Web dashboard
Port: Not documented in public README (default Docker Compose port)
Tech stack: Next.js + React (frontend/ui/)
Features:
- Session list with cost, tokens, health score, anomaly flags
- Session detail view with trace timeline
- Detector configuration
- Cost analytics and usage breakdown
- Real-time log streaming (WebSocket)
- Agentic debugging interface
- GitHub integration for commit correlation

Cloud Dashboard

app.traceroot.ai — hosted version with same features, no self-hosting required.

SDK Surface

Python SDK (pip install traceroot):

traceroot.initialize(integrations=[Integration.OPENAI, Integration.ANTHROPIC])

@observe(name="span_name", type="agent|tool|llm")
def my_function():
    ...

TypeScript SDK (traceroot-ts): Same API surface in Node.js.

Integrations

Instrumentation integrations (zero-code in most cases):

OpenAI, Anthropic, Google Gemini, Mistral
LangChain, LangGraph, CrewAI, AutoGen
Claude Agent SDK, OpenAI Agents SDK
Mastra, Vercel AI SDK, LlamaIndex
DSPy, Google ADK, Pydantic AI

Alerting

Email notifications on Detector triggers
Slack notifications on Detector triggers

Deployment

# Development
make dev

# Production (all Docker)
make prod

Kubernetes: deploy/ directory with Helm chart and Terraform (experimental, AWS-focused).

Related frameworks

same archetype · same primary tool · same memory type

MemPalace ★ 53k

A10 Memory engine

Verbatim local-first AI memory with 96.6% R@5 retrieval on LongMemEval using zero API calls — structured into a palace hierarchy…

Beads (Yegge) ★ 24k

A10 Memory engine

Dolt-powered distributed graph issue tracker where AI agents track tasks with hierarchical IDs and dependency edges, claim work…

deepagents (LangChain) ★ 23k

A10 Memory engine

Opinionated Python agent harness on top of LangGraph with sub-agents, filesystem, memory, and context compaction bundled in

agentmemory ★ 18k

A10 Memory engine

Persistent, searchable memory for AI coding agents that captures every tool interaction, compresses it via LLM, and injects…

Open Multi-Agent ★ 6.3k

A10 Memory engine

Give a natural-language goal to a coordinator agent and get a dynamically decomposed, parallelized task DAG executed by…

Basic Memory ★ 3.1k

A10 Memory engine

Gives AI agents a persistent, human-readable knowledge graph of project decisions, observations, and relations stored as plain…

Distribution

Type: docker-image
License: MIT
Install: multi-step
Version: v0.2.0

Surfaces

CLI binary: No
CLI subcmds: 0
Local UI: web-dashboard
Tech stack: Next.js + React

Components

Commands: 0
Skills: 0
Subagents: 0
Hooks: 1
MCP servers: 1
MCP tools: 1
Scripts: 2
Templates: 0

Workflow

Phases: 7
Approval gates: 0
Spec format: none
Spec storage: none
Delta or full: none

Orchestration

Multi-agent: No
Pattern: none
Max concurrent: 0
Isolation: container
Consensus: none
Prompt chaining: No

Multi-model

Multi-model: Yes
BYOK: Yes
Modal: text+vision

Execution

Mode: background-daemon
Crash recovery: No
Compaction: No
Session handoff: Yes
Streaming: Yes

Memory

Type: hybrid
Persistence: global
Search: full-text
State files: 3 files

Quality

TDD: No
TDD mechanism: none
Validators: 1
Self-review: external-llm

Git / Observability

Auto commit: No
Auto PR: Yes
Auto merge: No
Worktree/feat: No
Audit log: Yes
Audit format: jsonl
Replay: Yes

Tools

Primary: standalone-repo
Targets: 11
Portability: high

Signals

Stars: 591
Last commit: 2026-05-26
Contributors: 30
Maintainer: active
Quality score: 5.1/10