Implementation:Confident ai Deepeval CallbackHandler LangChain

**Metadata**
Knowledge Sources	DeepEval LangChain Callbacks
Domains	LLM_Evaluation AI_Agents
Last Updated	2026-02-14 09:00 GMT

Overview

Concrete callback handler class that integrates DeepEval with LangChain and LangGraph agent frameworks. The CallbackHandler class inherits from LangChain's BaseCallbackHandler and automatically captures execution traces -- including LLM calls, tool invocations, retriever queries, and chain orchestration events -- translating them into DeepEval's internal trace format for evaluation.

Description

The CallbackHandler is passed into LangChain's config={"callbacks": [handler]} mechanism. Once attached, it receives lifecycle events from the LangChain runtime and builds a hierarchical trace structure. At the end of an agent invocation, the handler can optionally run evaluation metrics against the collected trace and push results to the Confident AI platform.

Key capabilities:

Automatic trace capture -- intercepts LLM start/end, tool start/end, chain start/end, and retriever start/end events.
Metric evaluation -- when metrics or metric_collection are provided, evaluation runs automatically after each invocation.
Thread tracking -- the thread_id parameter enables grouping multiple invocations into a single conversation thread.
Metadata and tagging -- supports arbitrary metadata and tags for trace organization and filtering.

Usage

Import and attach to any LangChain agent or chain invocation:

from deepeval.integrations.langchain import CallbackHandler

Code Reference

Source Location

Repository: confident-ai/deepeval
File: deepeval/integrations/langchain/callback.py (lines 70--849)

Signature

class CallbackHandler(BaseCallbackHandler):
    def __init__(
        self,
        name: Optional[str] = None,
        tags: Optional[List[str]] = None,
        metadata: Optional[Dict[str, Any]] = None,
        thread_id: Optional[str] = None,
        user_id: Optional[str] = None,
        metrics: Optional[List[BaseMetric]] = None,
        metric_collection: Optional[str] = None,
        test_case_id: Optional[str] = None,
    ):
        ...

Import

from deepeval.integrations.langchain import CallbackHandler

Parent Class

BaseCallbackHandler from langchain_core.callbacks.base

I/O Contract

Inputs (Constructor Parameters)

**Input Contract**
Name	Type	Default	Description
`name`	Optional[str]	`None`	Human-readable name for the traced agent or chain.
`tags`	Optional[List[str]]	`None`	Tags for categorizing and filtering traces.
`metadata`	Optional[Dict[str, Any]]	`None`	Arbitrary key-value metadata attached to the trace.
`thread_id`	Optional[str]	`None`	Conversation thread identifier for grouping related invocations.
`user_id`	Optional[str]	`None`	Identifier for the end user associated with this trace.
`metrics`	Optional[List[BaseMetric]]	`None`	List of evaluation metrics to run automatically after each invocation.
`metric_collection`	Optional[str]	`None`	Name of a pre-defined metric collection on the Confident AI platform.
`test_case_id`	Optional[str]	`None`	Identifier for linking traces to specific test cases.

Outputs

**Output Contract**
Name	Type	Description
Trace	Internal trace object	Hierarchical trace of the agent execution, including all LLM calls, tool invocations, and chain steps.
Metric results	Evaluation scores	When metrics are configured, evaluation results are computed and optionally pushed to the Confident AI platform.

Usage Examples

Example 1: Basic Agent Instrumentation

Attach the callback handler to a LangChain agent with automatic task completion evaluation.

from deepeval.integrations.langchain import CallbackHandler
from deepeval.metrics import TaskCompletionMetric

handler = CallbackHandler(
    metrics=[TaskCompletionMetric()],
    name="my-agent",
)
agent.invoke({"input": "Hello"}, config={"callbacks": [handler]})

The handler is passed via the callbacks key in the LangChain config dictionary.
After the agent completes, the TaskCompletionMetric is automatically evaluated against the captured trace.

Example 2: Threaded Conversation Tracking

Track multiple invocations as part of the same conversation thread.

from deepeval.integrations.langchain import CallbackHandler

handler = CallbackHandler(
    name="support-agent",
    thread_id="conv-12345",
    user_id="user-abc",
    tags=["production", "support"],
    metadata={"department": "engineering"},
)

# First turn
agent.invoke({"input": "What is my account status?"}, config={"callbacks": [handler]})

# Second turn (same thread)
agent.invoke({"input": "Can you update my email?"}, config={"callbacks": [handler]})

Both invocations are grouped under the same thread_id, enabling conversation-level evaluation.

Related Pages

Principle:Confident_ai_Deepeval_Framework_Instrumentation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment