Workflow:Confident ai Deepeval LLM Tracing and Observability

Knowledge Sources	DeepEval Tracing Docs Confident AI
Domains	LLM_Observability, Tracing, Production_Monitoring
Last Updated	2026-02-14 09:00 GMT

Overview

End-to-end process for instrumenting LLM applications with tracing, configuring observability features, and monitoring production behavior through the Confident AI platform.

Description

This workflow covers the setup and configuration of DeepEval's tracing and observability system for production LLM applications. Using the @observe() decorator, application functions are instrumented to capture execution traces with typed spans (LLM, retriever, tool, agent, custom). Traces can be enriched with metadata, user IDs, thread IDs, tags, and custom names. The system supports data masking for sensitive information, sampling rate configuration, and environment tagging. Traces are sent to the Confident AI platform for visualization, debugging, and online evaluation. This workflow focuses on the observability setup rather than the evaluation-specific aspects covered in other workflows.

Usage

Execute this workflow when deploying an LLM application to production and needing visibility into its runtime behavior. This applies when you want to monitor LLM call patterns, track conversation threads, debug production issues via trace inspection, apply data masking for PII protection, or set up online evaluation of production responses.

Execution Steps

Step 1: Authenticate with Confident AI

Log in to the Confident AI platform using the DeepEval CLI. This stores the API key locally and enables trace upload. Optionally configure the region (US or EU) for data residency requirements.

Key considerations:

Run deepeval login and follow the CLI prompts
API key is stored in ~/.deepeval
EU region is auto-detected from confident_eu_ key prefix
Environment variables can be used instead of CLI login

Step 2: Instrument Application Functions

Apply the @observe() decorator to application functions that should be traced. Each decorated function becomes a span in the trace tree. Set appropriate span types to enable framework-specific attribute capture.

Span types and their attributes:

@observe(type="llm"): Captures model name, token counts, costs, prompt object
@observe(type="retriever"): Captures embedder model, topK, chunk size
@observe(type="tool"): Captures tool description and parameters
@observe(type="agent"): Captures available tools and agent handoffs
@observe(): Default custom span with input/output capture

Step 3: Enrich Traces with Metadata

Within observed functions, use update_current_trace() and update_current_span() to attach contextual metadata that aids debugging and analysis. This includes user identification, conversation threading, custom tags, and arbitrary metadata.

Enrichment APIs:

update_current_trace(name=..., user_id=..., thread_id=..., tags=..., metadata=...)
update_current_span(name=..., input=..., output=..., metadata=...)
update_llm_span(model=..., input_token_count=..., output_token_count=...)
update_retriever_span(embedder=..., top_k=..., chunk_size=...)

Step 4: Configure Tracing Options

Set up tracing configuration options including sampling rate, environment tagging, and data masking. These are configured via the TraceManager or environment variables and apply globally to all traces.

Configuration options:

Sampling rate: Control what percentage of traces are captured (0.0 to 1.0)
Environment: Tag traces with environment name (production, staging, development)
Data masking: Apply custom mask functions to redact sensitive data (PII, credentials) from trace inputs and outputs
API key: Override the default Confident AI API key

Step 5: Monitor and Analyze in Production

View traces on the Confident AI platform to monitor application behavior. Use the trace viewer to inspect execution flows, identify bottlenecks, and debug quality issues. Optionally set up online evaluation with metric collections to automatically score production responses.

Platform capabilities:

Visual trace tree with span details and timing
Filter traces by user, thread, tags, environment
Online evaluation via metric collections on traced responses
Annotation support for human review of traces
Export traced data to improve evaluation datasets

Execution Diagram

GitHub URL

Workflow Repository