Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Explodinggradients Ragas EvaluatorChain Class

From Leeroopedia


Metadata Value
Source src/ragas/integrations/langchain.py (Lines 32-206)
Domains Integration, LangChain
Last Updated 2026-02-10

Overview

Wraps any Ragas SingleTurnMetric as both a LangChain Chain and a LangSmith RunEvaluator, enabling seamless use of Ragas metrics within LangChain workflows and LangSmith evaluation runs.

Description

EvaluatorChain inherits from both langchain.chains.base.Chain and langsmith.evaluation.RunEvaluator. On initialization, it:

  1. Accepts a Ragas Metric instance and validates it is a SingleTurnMetric.
  2. Automatically wraps LLM and embedding dependencies: if the metric is a MetricWithLLM, it initializes a ChatOpenAI (or uses a provided one) wrapped in LangchainLLMWrapper. Similarly for MetricWithEmbeddings with OpenAIEmbeddings.
  3. Calls metric.init(run_config) to prepare the metric for execution.

The class provides:

  • _call and _acall: Synchronous and asynchronous chain execution. Inputs can be a dictionary (automatically converted from v1 to v2 format) or a SingleTurnSample. LangChain Document objects in retrieved_contexts are automatically converted to strings via page_content.
  • _validate: Checks that the input sample contains all columns required by the metric.
  • evaluate_run: Implements the RunEvaluator interface for LangSmith. It validates that the Run and Example contain the expected keys (question, ground_truth, answer, contexts), then invokes the chain and returns an EvaluationResult.
  • input_keys and output_keys properties dynamically derive required columns from the wrapped metric.

Usage

Use EvaluatorChain when you want to:

  • Run Ragas metrics as part of a LangChain pipeline.
  • Use Ragas metrics as custom evaluators in LangSmith evaluation runs.
  • Integrate Ragas scoring into existing LangChain-based applications.

Code Reference

Source Location

Item Detail
File src/ragas/integrations/langchain.py
Lines 32-206
Module ragas.integrations.langchain

Class Signature

class EvaluatorChain(Chain, RunEvaluator):
    metric: Metric

    def __init__(self, metric: Metric, **kwargs: Any) -> None: ...
    def _call(self, inputs: Union[Dict[str, Any], SingleTurnSample], run_manager=None) -> Dict[str, Any]: ...
    async def _acall(self, inputs: Union[Dict[str, Any], SingleTurnSample], run_manager=None) -> Dict[str, Any]: ...
    def evaluate_run(self, run: Run, example: Optional[Example] = None) -> EvaluationResult: ...

Import

from ragas.integrations.langchain import EvaluatorChain

I/O Contract

_call / _acall

Direction Name Type Description
Input inputs Union[Dict[str, Any], SingleTurnSample] Evaluation data; dicts are auto-converted to SingleTurnSample
Input run_manager Optional[CallbackManagerForChainRun] LangChain callback manager (optional)
Output (return) Dict[str, Any] Dictionary with metric name as key and score as value (e.g., {"faithfulness": 0.85})

evaluate_run

Direction Name Type Description
Input run Run LangSmith run containing chain outputs (answer, contexts)
Input example Optional[Example] LangSmith example containing inputs (question) and outputs (ground_truth)
Output (return) EvaluationResult LangSmith evaluation result with metric name and score

Constructor Parameters

Name Type Required Description
metric Metric Yes A Ragas SingleTurnMetric instance
llm ChatOpenAI No LLM for metrics requiring one (defaults to ChatOpenAI())
embeddings OpenAIEmbeddings No Embeddings for metrics requiring them (defaults to OpenAIEmbeddings())
run_config RunConfig No Execution configuration (defaults to RunConfig())

Usage Examples

Using as a LangChain Chain

from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import faithfulness

# Create an evaluator chain
evaluator = EvaluatorChain(metric=faithfulness)

# Run evaluation with a dictionary input
result = evaluator.invoke({
    "question": "What is retrieval augmented generation?",
    "answer": "RAG combines retrieval with generation.",
    "contexts": ["RAG is a technique that combines information retrieval with text generation."],
})

print(result["faithfulness"])  # e.g., 0.95

Using as a LangSmith RunEvaluator

from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import context_precision

# Create evaluator for use with LangSmith
evaluator = EvaluatorChain(metric=context_precision)

# Used automatically by LangSmith during evaluation runs
# The evaluate_run method is called by LangSmith's evaluation framework

Async Evaluation

import asyncio
from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import faithfulness

evaluator = EvaluatorChain(metric=faithfulness)

result = asyncio.run(evaluator.ainvoke({
    "question": "What is Ragas?",
    "answer": "Ragas is an evaluation toolkit.",
    "contexts": ["Ragas provides metrics for evaluating LLM applications."],
}))

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment