Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas LangChainEvaluatorChain

From Leeroopedia
Knowledge Sources
Domains LangChain Integration, LLM Evaluation, LangSmith
Last Updated 2026-02-12 00:00 GMT

Overview

EvaluatorChain wraps Ragas metrics as LangChain Chain objects and LangSmith RunEvaluator instances, enabling Ragas evaluation within LangChain pipelines and LangSmith experiment tracking.

Description

The EvaluatorChain class inherits from both LangChain's Chain and LangSmith's RunEvaluator, providing dual integration points for Ragas metrics in the LangChain ecosystem.

At initialization, the class:

  • Auto-provisions LLM and embeddings - If the wrapped metric requires an LLM (MetricWithLLM), it automatically initializes a ChatOpenAI instance wrapped in LangchainLLMWrapper. If the metric requires embeddings (MetricWithEmbeddings), it initializes OpenAIEmbeddings wrapped in LangchainEmbeddingsWrapper.
  • Validates metric type - Asserts that the provided metric is a SingleTurnMetric, as the chain interface processes one sample at a time.
  • Initializes the metric - Calls metric.init() with the provided or default RunConfig.

The class supports both synchronous (_call) and asynchronous (_acall) evaluation. Input dictionaries are automatically converted from v1 to v2 format using convert_row_v1_to_v2, and LangChain Document objects in retrieved_contexts are converted to plain text strings.

For LangSmith integration, the evaluate_run method accepts a LangSmith Run and Example, validates that required fields (question, ground_truth) are present, and returns an EvaluationResult with the metric name and computed score.

Usage

Use this class when you want to evaluate LLM applications built with LangChain using Ragas metrics, either inline within a LangChain pipeline or as an evaluator in LangSmith experiments. It bridges the gap between Ragas' metric system and LangChain's chain/callback infrastructure.

Code Reference

Source Location

Signature

class EvaluatorChain(Chain, RunEvaluator):
    metric: Metric

    def __init__(self, metric: Metric, **kwargs: Any): ...

    @property
    def input_keys(self) -> list[str]: ...

    @property
    def output_keys(self) -> list[str]: ...

    def _call(
        self,
        inputs: Union[dict[str, Any], SingleTurnSample],
        run_manager: Optional[CallbackManagerForChainRun] = None,
    ) -> dict[str, Any]: ...

    async def _acall(
        self,
        inputs: Union[Dict[str, Any], SingleTurnSample],
        run_manager: Optional[AsyncCallbackManagerForChainRun] = None,
    ) -> Dict[str, Any]: ...

    def evaluate_run(
        self, run: Run, example: Optional[Example] = None
    ) -> EvaluationResult: ...

Import

from ragas.integrations.langchain import EvaluatorChain

I/O Contract

Inputs (Constructor)

Name Type Required Description
metric Metric (SingleTurnMetric) Yes A Ragas metric instance to wrap; must implement SingleTurnMetric
llm ChatOpenAI No LangChain LLM to use; auto-created if metric requires LLM and not provided
embeddings OpenAIEmbeddings No LangChain embeddings to use; auto-created if metric requires embeddings and not provided
run_config RunConfig No Execution configuration for retry and timeout settings; defaults to a new RunConfig

Inputs (_call / _acall)

Name Type Required Description
inputs dict or SingleTurnSample Yes Evaluation data containing required columns for the metric (e.g., question, answer, contexts, ground_truth)
run_manager CallbackManagerForChainRun / AsyncCallbackManagerForChainRun No LangChain callback manager for tracking execution

Outputs

Name Type Description
return dict[str, Any] Dictionary mapping the metric name to its computed score (e.g., {"faithfulness": 0.85})

Outputs (evaluate_run)

Name Type Description
return EvaluationResult LangSmith EvaluationResult containing the metric name as key and the computed score

Usage Examples

Basic Chain Usage

from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import Faithfulness

# Create an evaluator chain with a Ragas metric
evaluator = EvaluatorChain(metric=Faithfulness())

# Evaluate a sample
result = evaluator.invoke({
    "question": "What is machine learning?",
    "answer": "Machine learning is a subset of AI that enables systems to learn from data.",
    "contexts": ["Machine learning is a branch of artificial intelligence."],
})
print(result)  # {"faithfulness": 0.95}

Async Evaluation

import asyncio
from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import Faithfulness

evaluator = EvaluatorChain(metric=Faithfulness())

async def evaluate():
    result = await evaluator.ainvoke({
        "question": "What is deep learning?",
        "answer": "Deep learning uses neural networks with multiple layers.",
        "contexts": ["Deep learning is a subset of machine learning using neural networks."],
    })
    return result

result = asyncio.run(evaluate())

LangSmith RunEvaluator Usage

from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import Faithfulness

evaluator = EvaluatorChain(metric=Faithfulness())

# Use as a LangSmith evaluator
# The evaluate_run method is called automatically by LangSmith
# when running experiments against a dataset
evaluation_result = evaluator.evaluate_run(run=run, example=example)
print(evaluation_result.score)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment