Implementation:Confident ai Deepeval Evaluate Trace
Overview
Evaluate Trace covers the implementation functions for offline evaluation of previously collected traces, spans, and threads against named metric collections on the Confident AI platform. This includes three functions -- evaluate_trace, evaluate_span, and evaluate_thread -- that submit evaluation requests at different granularities without re-running the original application.
API Documentation
Function: evaluate_trace
Source: deepeval/tracing/offline_evals/trace.py
Import:
from deepeval.tracing import evaluate_trace
Signature:
evaluate_trace(trace_uuid: str, metric_collection: str)
| Parameter | Type | Description |
|---|---|---|
trace_uuid |
str |
The unique identifier of the trace to evaluate. |
metric_collection |
str |
The name of the metric collection on Confident AI to apply. |
Function: evaluate_span
Source: deepeval/tracing/offline_evals/span.py
Import:
from deepeval.tracing import evaluate_span
Signature:
evaluate_span(span_uuid: str, metric_collection: str)
| Parameter | Type | Description |
|---|---|---|
span_uuid |
str |
The unique identifier of the span to evaluate. |
metric_collection |
str |
The name of the metric collection on Confident AI to apply. |
Function: evaluate_thread
Source: deepeval/tracing/offline_evals/thread.py
Import:
from deepeval.tracing import evaluate_thread
Signature:
evaluate_thread(thread_id: str, metric_collection: str, overwrite_metrics: bool = False)
| Parameter | Type | Description |
|---|---|---|
thread_id |
str |
The identifier of the conversation thread to evaluate. |
metric_collection |
str |
The name of the metric collection on Confident AI to apply. |
overwrite_metrics |
bool |
When True, overwrites any existing metric results for this thread. Defaults to False.
|
Input / Output (All Functions)
- Inputs: A target identifier (trace UUID, span UUID, or thread ID) and the name of a metric collection defined on the Confident AI platform.
- Outputs: An evaluation request is submitted to Confident AI. The evaluation results appear in the Confident AI dashboard associated with the specified trace, span, or thread.
Usage Examples
Evaluating a Trace
from deepeval.tracing import evaluate_trace
evaluate_trace(trace_uuid="abc-123", metric_collection="quality-checks")
Evaluating a Span
from deepeval.tracing import evaluate_span
evaluate_span(span_uuid="def-456", metric_collection="retrieval-quality")
Evaluating a Thread
from deepeval.tracing import evaluate_thread
evaluate_thread(
thread_id="thread-789",
metric_collection="conversation-quality",
overwrite_metrics=True,
)
Relationships
Principle:Confident_ai_Deepeval_Offline_Trace_Evaluation
Metadata
DeepEval Tracing Observability LLM_Evaluation 2026-02-14 09:00 GMT