Principle:Confident ai Deepeval Framework Instrumentation
| Knowledge Sources | |
|---|---|
| Domains | |
| Last Updated | 2026-02-14 09:00 GMT |
Overview
A design principle for instrumenting third-party agent frameworks to enable automatic trace collection during agent execution. Because each agent framework exposes different callback, hook, or middleware mechanisms, framework-specific adapters are required to capture execution traces in a unified format suitable for evaluation.
Description
Modern AI agent frameworks such as LangChain, LangGraph, PydanticAI, and the OpenAI Agents SDK each provide their own extensibility interfaces for observing agent behavior at runtime. These interfaces differ significantly:
- LangChain/LangGraph use a callback handler pattern -- classes that inherit from
BaseCallbackHandlerand receive lifecycle events (LLM start, tool call, chain completion, errors) as method invocations. - PydanticAI leverages OpenTelemetry instrumentation settings -- span processors that capture execution traces as OTEL spans.
- OpenAI Agents SDK uses a tracing processor interface -- classes implementing
TracingProcessorthat receive span start/end events.
The framework instrumentation principle recognizes that a single universal adapter is insufficient. Instead, each integration must implement the adapter pattern to translate framework-specific events into DeepEval's internal trace representation (consisting of LLM calls, tool invocations, agent steps, and nested spans). This enables downstream evaluation metrics to operate on a consistent data model regardless of which agent framework produced the trace.
Usage
Framework instrumentation is used when:
- An agent built with a supported framework needs to be automatically evaluated without manually constructing test cases.
- Developers want to capture production traces for offline evaluation or monitoring.
- Conversation-level metrics (task completion, tool use correctness, step efficiency) require full execution traces rather than simple input/output pairs.
The general pattern is:
FRAMEWORK_INSTRUMENTATION(framework F):
1. IDENTIFY the callback/hook interface provided by F
2. IMPLEMENT an adapter class conforming to F's interface
3. On each lifecycle event (LLM call, tool use, agent step):
a. TRANSLATE the event into DeepEval's internal trace format
b. ACCUMULATE trace spans in a hierarchical structure
4. On execution completion:
a. FINALIZE the trace
b. OPTIONALLY run evaluation metrics against the collected trace
c. OPTIONALLY push traces to Confident AI platform
Theoretical Basis
This principle draws from several established software engineering patterns:
- Adapter pattern -- each framework integration adapts a framework-specific interface to DeepEval's internal trace model, allowing the evaluation engine to remain framework-agnostic.
- Framework integration -- the instrumentation hooks into existing extension points rather than requiring source code modification, following the open-closed principle.
- Callback-based instrumentation -- by subscribing to lifecycle callbacks, the instrumentation layer observes agent behavior passively without altering execution semantics. This is analogous to aspect-oriented programming where cross-cutting concerns (tracing, evaluation) are separated from core logic.
The key insight is that evaluation should be decoupled from the agent framework. By standardizing on a common trace format and providing per-framework adapters, DeepEval achieves broad framework coverage while maintaining a single evaluation pipeline.