AI Agent Observability Platform

Debug your AI
agents before
your users do.

OopsTrace gives you a live trace tree for every agent run. Error spans light up the moment they fail — click one to open an AI chat pre-loaded with full context and fix the bug instantly.

Get started free See how it works

Built for production AI teams

Everything you need to observe, debug, and improve your agents — in one place.

Real-time trace streaming

Watch spans arrive as your agent runs. Live streaming shows you exactly what's happening the moment it happens — no refresh needed.

Instant error detection

Error spans light up the moment they fail. Stack traces, status codes, and messages — surfaced automatically in the trace tree.

One-click LLM fix

Click a failing span. An AI assistant opens with full trace context pre-loaded — inputs, outputs, parent chain, token usage. Fix in seconds.

Multi-agent tracing

Trace across agent boundaries, tool calls, and sub-agents in a single unified tree. See the full execution graph of complex pipelines.

Session & user analytics

Group traces by user session, correlate failures across runs, and drill into cohort-level behaviour — no separate BI tool needed.

Prompt version tracking

Track which prompt version caused a failure. Compare outputs across model versions and prompt iterations directly from the trace view.

From deploy to fix in minutes

No complex setup. Add two lines, open the dashboard, and start debugging.

Instrument in 2 lines

Add the OopsTrace SDK to your agent. One decorator captures the full execution tree — all tool calls, LLM requests, and sub-agents automatically.

from oopstrace import OopsTrace

OopsTrace.init(api_key="ot_...")

@OopsTrace.trace
async def my_agent(query: str):
    result = await llm_call(query)
    return result

Observe in real time

Open the dashboard and watch spans arrive as your agent executes. Each span shows timing, token usage, cost, and status as it runs.

# Spans appear as your agent runs:
#
# ● my_agent                    2.4s  ✓
#   ● retrieve_context           320ms ✓
#   ● llm_call                   1.1s  ✗  ← Oops!
#     ● tool_execution            240ms ✗
#   ● generate_response          680ms ✓

Fix with AI

Click the red span. An AI assistant opens with the full trace context — error message, inputs, outputs, parent chain. Diagnose and fix instantly.

# AI assistant context (auto-injected):
# Span:  llm_call
# Error: RateLimitError — quota exceeded
# Input: { model: "gpt-4o", messages: [...] }
# Parent: my_agent → retrieve_context
#
> Why did this span fail?
> Suggest a retry strategy with backoff.

Deploy on your infrastructure

Run OopsTrace on your own servers. Your traces stay in your environment — no data leaves your infrastructure. Full control, no vendor lock-in.

bash

$ git clone github.com/oopstrace/oopstrace
$ docker compose up -d
$ # Dashboard ready at localhost:3000

Debug your AIagents beforeyour users do.

Built for production AI teams

From deploy to fix in minutes

Deploy on your infrastructure

Stop guessing.Start tracing.

Debug your AI
agents before
your users do.

Stop guessing.
Start tracing.