Implementation:Confident ai Deepeval Observe Decorator
| Knowledge Sources | |
|---|---|
| Domains | |
| Last Updated | 2026-02-14 09:00 GMT |
Overview
The @observe decorator is the primary instrumentation mechanism in DeepEval for wrapping individual application functions as traced spans with optional per-span metric evaluation. It supports synchronous functions, asynchronous coroutines, and generator functions, automatically capturing inputs and outputs to form a hierarchical trace tree.
Description
The @observe decorator transforms a Python function into an instrumented component that creates a traced span on each invocation. The span records the function's inputs and outputs, associates a span type (agent, llm, retriever, tool), and optionally evaluates a list of per-span metrics against the captured data. When nested, decorated functions produce a parent-child span hierarchy that mirrors the application's call graph.
Key behaviors:
- Automatic I/O capture -- Function arguments are recorded as span inputs; the return value is recorded as span output. No manual wiring is required.
- Span typing -- The
typeparameter classifies the span, enabling type-specific analysis (e.g., filtering all retriever spans across traces). - Per-span metrics -- The
metricsparameter attachesBaseMetricinstances that are evaluated against the span's test case data after execution completes. - Sync/async/generator support -- The decorator detects the wrapped function's type and applies the appropriate wrapper (synchronous,
async, or generator-aware).
Usage
Import and apply the decorator to any function that represents a meaningful processing step:
from deepeval.tracing import observe
Apply to functions that should appear as spans in the trace tree. Use the type parameter to classify the span and metrics to attach evaluation logic.
Code Reference
Source Location
- Repository:
confident-ai/deepeval - File:
deepeval/tracing/tracing.py(lines 1063--1248)
Signature
@observe(
_func=None,
*,
metrics=None,
metric_collection=None,
type=None,
**observe_kwargs
)
Where **observe_kwargs may include:
# Named keyword arguments accepted via **observe_kwargs:
name: Optional[str] # custom span name (defaults to function name)
model: Optional[str] # model identifier for LLM spans
prompt: Optional[str] # prompt template used
available_tools: Optional[List[str]] # tools available to the agent
embedder: Optional[str] # embedding model for retriever spans
description: Optional[str] # human-readable span description
top_k: Optional[int] # retriever top-k parameter
chunk_size: Optional[int] # chunking configuration
Import
from deepeval.tracing import observe
I/O Contract
Inputs
| Name | Type | Description |
|---|---|---|
_func |
Optional[Callable] | The function to decorate. When None, the decorator is used with parentheses and keyword arguments.
|
metrics |
Optional[List[BaseMetric]] | List of metric instances to evaluate against this span's captured test case data after execution. |
metric_collection |
Optional[str] | Name of a predefined metric collection to apply to this span. |
type |
Optional[Literal["agent", "llm", "retriever", "tool"]] | Classifies the span type for filtering and analysis. Determines how the span appears in the trace tree. |
**observe_kwargs |
keyword args | Additional span metadata: name, model, prompt, available_tools, embedder, description, top_k, chunk_size.
|
Outputs
| Name | Type | Description |
|---|---|---|
| Decorated function | Callable | The original function wrapped with tracing instrumentation. The return value of the function is preserved unchanged. |
| Traced span | Span object (side effect) | A span is created in the current trace context capturing inputs, outputs, timing, type, and metric scores. |
Usage Examples
Example 1: LLM Component with Answer Relevancy Metric
Instrumenting a generation function with a per-span metric that evaluates answer relevancy.
from deepeval.tracing import observe
from deepeval.metrics import AnswerRelevancyMetric
@observe(metrics=[AnswerRelevancyMetric()], type="llm")
def generate_response(query: str) -> str:
return llm.invoke(query)
- The
type="llm"parameter classifies this span as a language model invocation. - The
metricsparameter attaches anAnswerRelevancyMetricthat is evaluated against the span's captured input/output after the function returns.
Example 2: Retriever Component
Instrumenting a retrieval function to capture retrieved documents as a traced span.
from deepeval.tracing import observe
@observe(type="retriever", top_k=5, embedder="text-embedding-3-small")
def retrieve_documents(query: str) -> list:
return vector_store.similarity_search(query, k=5)
- The
type="retriever"classifies the span for retriever-specific analysis. - The
top_kandembedderkeyword arguments are recorded as span metadata via**observe_kwargs.
Example 3: Tool Component
Instrumenting a tool call within an agent pipeline.
from deepeval.tracing import observe
@observe(type="tool", name="web_search")
def search_web(query: str) -> str:
return search_api.query(query)
- The
name="web_search"overrides the default span name (which would be the function namesearch_web).