Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Confident ai Deepeval Observe Decorator

From Leeroopedia
Metadata
Knowledge Sources
Domains
Last Updated 2026-02-14 09:00 GMT

Overview

The @observe decorator is the primary instrumentation mechanism in DeepEval for wrapping individual application functions as traced spans with optional per-span metric evaluation. It supports synchronous functions, asynchronous coroutines, and generator functions, automatically capturing inputs and outputs to form a hierarchical trace tree.

Description

The @observe decorator transforms a Python function into an instrumented component that creates a traced span on each invocation. The span records the function's inputs and outputs, associates a span type (agent, llm, retriever, tool), and optionally evaluates a list of per-span metrics against the captured data. When nested, decorated functions produce a parent-child span hierarchy that mirrors the application's call graph.

Key behaviors:

  • Automatic I/O capture -- Function arguments are recorded as span inputs; the return value is recorded as span output. No manual wiring is required.
  • Span typing -- The type parameter classifies the span, enabling type-specific analysis (e.g., filtering all retriever spans across traces).
  • Per-span metrics -- The metrics parameter attaches BaseMetric instances that are evaluated against the span's test case data after execution completes.
  • Sync/async/generator support -- The decorator detects the wrapped function's type and applies the appropriate wrapper (synchronous, async, or generator-aware).

Usage

Import and apply the decorator to any function that represents a meaningful processing step:

from deepeval.tracing import observe

Apply to functions that should appear as spans in the trace tree. Use the type parameter to classify the span and metrics to attach evaluation logic.

Code Reference

Source Location

  • Repository: confident-ai/deepeval
  • File: deepeval/tracing/tracing.py (lines 1063--1248)

Signature

@observe(
    _func=None,
    *,
    metrics=None,
    metric_collection=None,
    type=None,
    **observe_kwargs
)

Where **observe_kwargs may include:

# Named keyword arguments accepted via **observe_kwargs:
name: Optional[str]            # custom span name (defaults to function name)
model: Optional[str]           # model identifier for LLM spans
prompt: Optional[str]          # prompt template used
available_tools: Optional[List[str]]  # tools available to the agent
embedder: Optional[str]        # embedding model for retriever spans
description: Optional[str]     # human-readable span description
top_k: Optional[int]           # retriever top-k parameter
chunk_size: Optional[int]      # chunking configuration

Import

from deepeval.tracing import observe

I/O Contract

Inputs

Input Contract
Name Type Description
_func Optional[Callable] The function to decorate. When None, the decorator is used with parentheses and keyword arguments.
metrics Optional[List[BaseMetric]] List of metric instances to evaluate against this span's captured test case data after execution.
metric_collection Optional[str] Name of a predefined metric collection to apply to this span.
type Optional[Literal["agent", "llm", "retriever", "tool"]] Classifies the span type for filtering and analysis. Determines how the span appears in the trace tree.
**observe_kwargs keyword args Additional span metadata: name, model, prompt, available_tools, embedder, description, top_k, chunk_size.

Outputs

Output Contract
Name Type Description
Decorated function Callable The original function wrapped with tracing instrumentation. The return value of the function is preserved unchanged.
Traced span Span object (side effect) A span is created in the current trace context capturing inputs, outputs, timing, type, and metric scores.

Usage Examples

Example 1: LLM Component with Answer Relevancy Metric

Instrumenting a generation function with a per-span metric that evaluates answer relevancy.

from deepeval.tracing import observe
from deepeval.metrics import AnswerRelevancyMetric

@observe(metrics=[AnswerRelevancyMetric()], type="llm")
def generate_response(query: str) -> str:
    return llm.invoke(query)
  • The type="llm" parameter classifies this span as a language model invocation.
  • The metrics parameter attaches an AnswerRelevancyMetric that is evaluated against the span's captured input/output after the function returns.

Example 2: Retriever Component

Instrumenting a retrieval function to capture retrieved documents as a traced span.

from deepeval.tracing import observe

@observe(type="retriever", top_k=5, embedder="text-embedding-3-small")
def retrieve_documents(query: str) -> list:
    return vector_store.similarity_search(query, k=5)
  • The type="retriever" classifies the span for retriever-specific analysis.
  • The top_k and embedder keyword arguments are recorded as span metadata via **observe_kwargs.

Example 3: Tool Component

Instrumenting a tool call within an agent pipeline.

from deepeval.tracing import observe

@observe(type="tool", name="web_search")
def search_web(query: str) -> str:
    return search_api.query(query)
  • The name="web_search" overrides the default span name (which would be the function name search_web).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment