Implementation:Vibrantlabsai Ragas LangChainEvaluatorChain

Knowledge Sources	Vibrantlabsai_Ragas
Domains	LangChain Integration, LLM Evaluation, LangSmith
Last Updated	2026-02-12 00:00 GMT

Overview

EvaluatorChain wraps Ragas metrics as LangChain Chain objects and LangSmith RunEvaluator instances, enabling Ragas evaluation within LangChain pipelines and LangSmith experiment tracking.

Description

The EvaluatorChain class inherits from both LangChain's Chain and LangSmith's RunEvaluator, providing dual integration points for Ragas metrics in the LangChain ecosystem.

At initialization, the class:

Auto-provisions LLM and embeddings - If the wrapped metric requires an LLM (MetricWithLLM), it automatically initializes a ChatOpenAI instance wrapped in LangchainLLMWrapper. If the metric requires embeddings (MetricWithEmbeddings), it initializes OpenAIEmbeddings wrapped in LangchainEmbeddingsWrapper.
Validates metric type - Asserts that the provided metric is a SingleTurnMetric, as the chain interface processes one sample at a time.
Initializes the metric - Calls metric.init() with the provided or default RunConfig.

The class supports both synchronous (_call) and asynchronous (_acall) evaluation. Input dictionaries are automatically converted from v1 to v2 format using convert_row_v1_to_v2, and LangChain Document objects in retrieved_contexts are converted to plain text strings.

For LangSmith integration, the evaluate_run method accepts a LangSmith Run and Example, validates that required fields (question, ground_truth) are present, and returns an EvaluationResult with the metric name and computed score.

Usage

Use this class when you want to evaluate LLM applications built with LangChain using Ragas metrics, either inline within a LangChain pipeline or as an evaluator in LangSmith experiments. It bridges the gap between Ragas' metric system and LangChain's chain/callback infrastructure.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/integrations/langchain.py

Signature

class EvaluatorChain(Chain, RunEvaluator):
    metric: Metric

    def __init__(self, metric: Metric, **kwargs: Any): ...

    @property
    def input_keys(self) -> list[str]: ...

    @property
    def output_keys(self) -> list[str]: ...

    def _call(
        self,
        inputs: Union[dict[str, Any], SingleTurnSample],
        run_manager: Optional[CallbackManagerForChainRun] = None,
    ) -> dict[str, Any]: ...

    async def _acall(
        self,
        inputs: Union[Dict[str, Any], SingleTurnSample],
        run_manager: Optional[AsyncCallbackManagerForChainRun] = None,
    ) -> Dict[str, Any]: ...

    def evaluate_run(
        self, run: Run, example: Optional[Example] = None
    ) -> EvaluationResult: ...

Import

from ragas.integrations.langchain import EvaluatorChain

I/O Contract

Inputs (Constructor)

Name	Type	Required	Description
metric	Metric (SingleTurnMetric)	Yes	A Ragas metric instance to wrap; must implement SingleTurnMetric
llm	ChatOpenAI	No	LangChain LLM to use; auto-created if metric requires LLM and not provided
embeddings	OpenAIEmbeddings	No	LangChain embeddings to use; auto-created if metric requires embeddings and not provided
run_config	RunConfig	No	Execution configuration for retry and timeout settings; defaults to a new RunConfig

Inputs (_call / _acall)

Name	Type	Required	Description
inputs	dict or SingleTurnSample	Yes	Evaluation data containing required columns for the metric (e.g., question, answer, contexts, ground_truth)
run_manager	CallbackManagerForChainRun / AsyncCallbackManagerForChainRun	No	LangChain callback manager for tracking execution

Outputs

Name	Type	Description
return	dict[str, Any]	Dictionary mapping the metric name to its computed score (e.g., {"faithfulness": 0.85})

Outputs (evaluate_run)

Name	Type	Description
return	EvaluationResult	LangSmith EvaluationResult containing the metric name as key and the computed score

Usage Examples

Basic Chain Usage

from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import Faithfulness

# Create an evaluator chain with a Ragas metric
evaluator = EvaluatorChain(metric=Faithfulness())

# Evaluate a sample
result = evaluator.invoke({
    "question": "What is machine learning?",
    "answer": "Machine learning is a subset of AI that enables systems to learn from data.",
    "contexts": ["Machine learning is a branch of artificial intelligence."],
})
print(result)  # {"faithfulness": 0.95}

Async Evaluation

import asyncio
from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import Faithfulness

evaluator = EvaluatorChain(metric=Faithfulness())

async def evaluate():
    result = await evaluator.ainvoke({
        "question": "What is deep learning?",
        "answer": "Deep learning uses neural networks with multiple layers.",
        "contexts": ["Deep learning is a subset of machine learning using neural networks."],
    })
    return result

result = asyncio.run(evaluate())

LangSmith RunEvaluator Usage

from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import Faithfulness

evaluator = EvaluatorChain(metric=Faithfulness())

# Use as a LangSmith evaluator
# The evaluate_run method is called automatically by LangSmith
# when running experiments against a dataset
evaluation_result = evaluator.evaluate_run(run=run, example=example)
print(evaluation_result.score)

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment