Implementation:Confident ai Deepeval Observe Decorator

**Metadata**
Knowledge Sources	DeepEval
Domains	LLM_Evaluation Observability Tracing
Last Updated	2026-02-14 09:00 GMT

Overview

The @observe decorator is the primary instrumentation mechanism in DeepEval for wrapping individual application functions as traced spans with optional per-span metric evaluation. It supports synchronous functions, asynchronous coroutines, and generator functions, automatically capturing inputs and outputs to form a hierarchical trace tree.

Description

The @observe decorator transforms a Python function into an instrumented component that creates a traced span on each invocation. The span records the function's inputs and outputs, associates a span type (agent, llm, retriever, tool), and optionally evaluates a list of per-span metrics against the captured data. When nested, decorated functions produce a parent-child span hierarchy that mirrors the application's call graph.

Key behaviors:

Automatic I/O capture -- Function arguments are recorded as span inputs; the return value is recorded as span output. No manual wiring is required.
Span typing -- The type parameter classifies the span, enabling type-specific analysis (e.g., filtering all retriever spans across traces).
Per-span metrics -- The metrics parameter attaches BaseMetric instances that are evaluated against the span's test case data after execution completes.
Sync/async/generator support -- The decorator detects the wrapped function's type and applies the appropriate wrapper (synchronous, async, or generator-aware).

Usage

Import and apply the decorator to any function that represents a meaningful processing step:

from deepeval.tracing import observe

Apply to functions that should appear as spans in the trace tree. Use the type parameter to classify the span and metrics to attach evaluation logic.

Code Reference

Source Location

Repository: confident-ai/deepeval
File: deepeval/tracing/tracing.py (lines 1063--1248)

Signature

@observe(
    _func=None,
    *,
    metrics=None,
    metric_collection=None,
    type=None,
    **observe_kwargs
)

Where **observe_kwargs may include:

# Named keyword arguments accepted via **observe_kwargs:
name: Optional[str]            # custom span name (defaults to function name)
model: Optional[str]           # model identifier for LLM spans
prompt: Optional[str]          # prompt template used
available_tools: Optional[List[str]]  # tools available to the agent
embedder: Optional[str]        # embedding model for retriever spans
description: Optional[str]     # human-readable span description
top_k: Optional[int]           # retriever top-k parameter
chunk_size: Optional[int]      # chunking configuration

Import

from deepeval.tracing import observe

I/O Contract

Inputs

**Input Contract**
Name	Type	Description
`_func`	Optional[Callable]	The function to decorate. When `None`, the decorator is used with parentheses and keyword arguments.
`metrics`	Optional[List[BaseMetric]]	List of metric instances to evaluate against this span's captured test case data after execution.
`metric_collection`	Optional[str]	Name of a predefined metric collection to apply to this span.
`type`	Optional[Literal["agent", "llm", "retriever", "tool"]]	Classifies the span type for filtering and analysis. Determines how the span appears in the trace tree.
`**observe_kwargs`	keyword args	Additional span metadata: `name`, `model`, `prompt`, `available_tools`, `embedder`, `description`, `top_k`, `chunk_size`.

Outputs

**Output Contract**
Name	Type	Description
Decorated function	Callable	The original function wrapped with tracing instrumentation. The return value of the function is preserved unchanged.
Traced span	Span object (side effect)	A span is created in the current trace context capturing inputs, outputs, timing, type, and metric scores.

Usage Examples

Example 1: LLM Component with Answer Relevancy Metric

Instrumenting a generation function with a per-span metric that evaluates answer relevancy.

from deepeval.tracing import observe
from deepeval.metrics import AnswerRelevancyMetric

@observe(metrics=[AnswerRelevancyMetric()], type="llm")
def generate_response(query: str) -> str:
    return llm.invoke(query)

The type="llm" parameter classifies this span as a language model invocation.
The metrics parameter attaches an AnswerRelevancyMetric that is evaluated against the span's captured input/output after the function returns.

Example 2: Retriever Component

Instrumenting a retrieval function to capture retrieved documents as a traced span.

from deepeval.tracing import observe

@observe(type="retriever", top_k=5, embedder="text-embedding-3-small")
def retrieve_documents(query: str) -> list:
    return vector_store.similarity_search(query, k=5)

The type="retriever" classifies the span for retriever-specific analysis.
The top_k and embedder keyword arguments are recorded as span metadata via **observe_kwargs.

Example 3: Tool Component

Instrumenting a tool call within an agent pipeline.

from deepeval.tracing import observe

@observe(type="tool", name="web_search")
def search_web(query: str) -> str:
    return search_api.query(query)

The name="web_search" overrides the default span name (which would be the function name search_web).

Related Pages

Principle:Confident_ai_Deepeval_Component_Instrumentation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment