Implementation:Explodinggradients Ragas EvaluatorChain Class

Metadata	Value
Source	`src/ragas/integrations/langchain.py` (Lines 32-206)
Domains	Integration, LangChain
Last Updated	2026-02-10

Overview

Wraps any Ragas SingleTurnMetric as both a LangChain Chain and a LangSmith RunEvaluator, enabling seamless use of Ragas metrics within LangChain workflows and LangSmith evaluation runs.

Description

EvaluatorChain inherits from both langchain.chains.base.Chain and langsmith.evaluation.RunEvaluator. On initialization, it:

Accepts a Ragas Metric instance and validates it is a SingleTurnMetric.
Automatically wraps LLM and embedding dependencies: if the metric is a MetricWithLLM, it initializes a ChatOpenAI (or uses a provided one) wrapped in LangchainLLMWrapper. Similarly for MetricWithEmbeddings with OpenAIEmbeddings.
Calls metric.init(run_config) to prepare the metric for execution.

The class provides:

_call and _acall: Synchronous and asynchronous chain execution. Inputs can be a dictionary (automatically converted from v1 to v2 format) or a SingleTurnSample. LangChain Document objects in retrieved_contexts are automatically converted to strings via page_content.

_validate: Checks that the input sample contains all columns required by the metric.

evaluate_run: Implements the RunEvaluator interface for LangSmith. It validates that the Run and Example contain the expected keys (question, ground_truth, answer, contexts), then invokes the chain and returns an EvaluationResult.

input_keys and output_keys properties dynamically derive required columns from the wrapped metric.

Usage

Use EvaluatorChain when you want to:

Run Ragas metrics as part of a LangChain pipeline.
Use Ragas metrics as custom evaluators in LangSmith evaluation runs.
Integrate Ragas scoring into existing LangChain-based applications.

Code Reference

Source Location

Item	Detail
File	`src/ragas/integrations/langchain.py`
Lines	32-206
Module	`ragas.integrations.langchain`

Class Signature

class EvaluatorChain(Chain, RunEvaluator):
    metric: Metric

    def __init__(self, metric: Metric, **kwargs: Any) -> None: ...
    def _call(self, inputs: Union[Dict[str, Any], SingleTurnSample], run_manager=None) -> Dict[str, Any]: ...
    async def _acall(self, inputs: Union[Dict[str, Any], SingleTurnSample], run_manager=None) -> Dict[str, Any]: ...
    def evaluate_run(self, run: Run, example: Optional[Example] = None) -> EvaluationResult: ...

Import

from ragas.integrations.langchain import EvaluatorChain

I/O Contract

`_call` / `_acall`

Direction	Name	Type	Description
Input	`inputs`	`Union[Dict[str, Any], SingleTurnSample]`	Evaluation data; dicts are auto-converted to `SingleTurnSample`
Input	`run_manager`	`Optional[CallbackManagerForChainRun]`	LangChain callback manager (optional)
Output	(return)	`Dict[str, Any]`	Dictionary with metric name as key and score as value (e.g., `{"faithfulness": 0.85}`)

`evaluate_run`

Direction	Name	Type	Description
Input	`run`	`Run`	LangSmith run containing chain outputs (`answer`, `contexts`)
Input	`example`	`Optional[Example]`	LangSmith example containing inputs (`question`) and outputs (`ground_truth`)
Output	(return)	`EvaluationResult`	LangSmith evaluation result with metric name and score

Constructor Parameters

Name	Type	Required	Description
`metric`	`Metric`	Yes	A Ragas `SingleTurnMetric` instance
`llm`	`ChatOpenAI`	No	LLM for metrics requiring one (defaults to `ChatOpenAI()`)
`embeddings`	`OpenAIEmbeddings`	No	Embeddings for metrics requiring them (defaults to `OpenAIEmbeddings()`)
`run_config`	`RunConfig`	No	Execution configuration (defaults to `RunConfig()`)

Usage Examples

Using as a LangChain Chain

from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import faithfulness

# Create an evaluator chain
evaluator = EvaluatorChain(metric=faithfulness)

# Run evaluation with a dictionary input
result = evaluator.invoke({
    "question": "What is retrieval augmented generation?",
    "answer": "RAG combines retrieval with generation.",
    "contexts": ["RAG is a technique that combines information retrieval with text generation."],
})

print(result["faithfulness"])  # e.g., 0.95

Using as a LangSmith RunEvaluator

from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import context_precision

# Create evaluator for use with LangSmith
evaluator = EvaluatorChain(metric=context_precision)

# Used automatically by LangSmith during evaluation runs
# The evaluate_run method is called by LangSmith's evaluation framework

Async Evaluation

import asyncio
from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import faithfulness

evaluator = EvaluatorChain(metric=faithfulness)

result = asyncio.run(evaluator.ainvoke({
    "question": "What is Ragas?",
    "answer": "Ragas is an evaluation toolkit.",
    "contexts": ["Ragas provides metrics for evaluating LLM applications."],
}))

Related Pages

LangSmith Integration - Uses EvaluatorChain internally for running evaluations on LangSmith datasets
LlamaIndex Integration - Alternative integration for LlamaIndex query engines
Messages Module - Core message types used in evaluation samples

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment