Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Confident ai Deepeval Evaluate Trace

From Leeroopedia
Revision as of 12:18, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Confident_ai_Deepeval_Evaluate_Trace.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

Evaluate Trace covers the implementation functions for offline evaluation of previously collected traces, spans, and threads against named metric collections on the Confident AI platform. This includes three functions -- evaluate_trace, evaluate_span, and evaluate_thread -- that submit evaluation requests at different granularities without re-running the original application.

API Documentation

Function: evaluate_trace

Source: deepeval/tracing/offline_evals/trace.py

Import:

from deepeval.tracing import evaluate_trace

Signature:

evaluate_trace(trace_uuid: str, metric_collection: str)
Parameter Type Description
trace_uuid str The unique identifier of the trace to evaluate.
metric_collection str The name of the metric collection on Confident AI to apply.

Function: evaluate_span

Source: deepeval/tracing/offline_evals/span.py

Import:

from deepeval.tracing import evaluate_span

Signature:

evaluate_span(span_uuid: str, metric_collection: str)
Parameter Type Description
span_uuid str The unique identifier of the span to evaluate.
metric_collection str The name of the metric collection on Confident AI to apply.

Function: evaluate_thread

Source: deepeval/tracing/offline_evals/thread.py

Import:

from deepeval.tracing import evaluate_thread

Signature:

evaluate_thread(thread_id: str, metric_collection: str, overwrite_metrics: bool = False)
Parameter Type Description
thread_id str The identifier of the conversation thread to evaluate.
metric_collection str The name of the metric collection on Confident AI to apply.
overwrite_metrics bool When True, overwrites any existing metric results for this thread. Defaults to False.

Input / Output (All Functions)

  • Inputs: A target identifier (trace UUID, span UUID, or thread ID) and the name of a metric collection defined on the Confident AI platform.
  • Outputs: An evaluation request is submitted to Confident AI. The evaluation results appear in the Confident AI dashboard associated with the specified trace, span, or thread.

Usage Examples

Evaluating a Trace

from deepeval.tracing import evaluate_trace

evaluate_trace(trace_uuid="abc-123", metric_collection="quality-checks")

Evaluating a Span

from deepeval.tracing import evaluate_span

evaluate_span(span_uuid="def-456", metric_collection="retrieval-quality")

Evaluating a Thread

from deepeval.tracing import evaluate_thread

evaluate_thread(
    thread_id="thread-789",
    metric_collection="conversation-quality",
    overwrite_metrics=True,
)

Relationships

Principle:Confident_ai_Deepeval_Offline_Trace_Evaluation

Metadata

DeepEval Tracing Observability LLM_Evaluation 2026-02-14 09:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment