Implementation:Vibrantlabsai Ragas LangChainEvaluatorChain
| Knowledge Sources | |
|---|---|
| Domains | LangChain Integration, LLM Evaluation, LangSmith |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
EvaluatorChain wraps Ragas metrics as LangChain Chain objects and LangSmith RunEvaluator instances, enabling Ragas evaluation within LangChain pipelines and LangSmith experiment tracking.
Description
The EvaluatorChain class inherits from both LangChain's Chain and LangSmith's RunEvaluator, providing dual integration points for Ragas metrics in the LangChain ecosystem.
At initialization, the class:
- Auto-provisions LLM and embeddings - If the wrapped metric requires an LLM (MetricWithLLM), it automatically initializes a ChatOpenAI instance wrapped in LangchainLLMWrapper. If the metric requires embeddings (MetricWithEmbeddings), it initializes OpenAIEmbeddings wrapped in LangchainEmbeddingsWrapper.
- Validates metric type - Asserts that the provided metric is a SingleTurnMetric, as the chain interface processes one sample at a time.
- Initializes the metric - Calls metric.init() with the provided or default RunConfig.
The class supports both synchronous (_call) and asynchronous (_acall) evaluation. Input dictionaries are automatically converted from v1 to v2 format using convert_row_v1_to_v2, and LangChain Document objects in retrieved_contexts are converted to plain text strings.
For LangSmith integration, the evaluate_run method accepts a LangSmith Run and Example, validates that required fields (question, ground_truth) are present, and returns an EvaluationResult with the metric name and computed score.
Usage
Use this class when you want to evaluate LLM applications built with LangChain using Ragas metrics, either inline within a LangChain pipeline or as an evaluator in LangSmith experiments. It bridges the gap between Ragas' metric system and LangChain's chain/callback infrastructure.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/integrations/langchain.py
Signature
class EvaluatorChain(Chain, RunEvaluator):
metric: Metric
def __init__(self, metric: Metric, **kwargs: Any): ...
@property
def input_keys(self) -> list[str]: ...
@property
def output_keys(self) -> list[str]: ...
def _call(
self,
inputs: Union[dict[str, Any], SingleTurnSample],
run_manager: Optional[CallbackManagerForChainRun] = None,
) -> dict[str, Any]: ...
async def _acall(
self,
inputs: Union[Dict[str, Any], SingleTurnSample],
run_manager: Optional[AsyncCallbackManagerForChainRun] = None,
) -> Dict[str, Any]: ...
def evaluate_run(
self, run: Run, example: Optional[Example] = None
) -> EvaluationResult: ...
Import
from ragas.integrations.langchain import EvaluatorChain
I/O Contract
Inputs (Constructor)
| Name | Type | Required | Description |
|---|---|---|---|
| metric | Metric (SingleTurnMetric) | Yes | A Ragas metric instance to wrap; must implement SingleTurnMetric |
| llm | ChatOpenAI | No | LangChain LLM to use; auto-created if metric requires LLM and not provided |
| embeddings | OpenAIEmbeddings | No | LangChain embeddings to use; auto-created if metric requires embeddings and not provided |
| run_config | RunConfig | No | Execution configuration for retry and timeout settings; defaults to a new RunConfig |
Inputs (_call / _acall)
| Name | Type | Required | Description |
|---|---|---|---|
| inputs | dict or SingleTurnSample | Yes | Evaluation data containing required columns for the metric (e.g., question, answer, contexts, ground_truth) |
| run_manager | CallbackManagerForChainRun / AsyncCallbackManagerForChainRun | No | LangChain callback manager for tracking execution |
Outputs
| Name | Type | Description |
|---|---|---|
| return | dict[str, Any] | Dictionary mapping the metric name to its computed score (e.g., {"faithfulness": 0.85}) |
Outputs (evaluate_run)
| Name | Type | Description |
|---|---|---|
| return | EvaluationResult | LangSmith EvaluationResult containing the metric name as key and the computed score |
Usage Examples
Basic Chain Usage
from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import Faithfulness
# Create an evaluator chain with a Ragas metric
evaluator = EvaluatorChain(metric=Faithfulness())
# Evaluate a sample
result = evaluator.invoke({
"question": "What is machine learning?",
"answer": "Machine learning is a subset of AI that enables systems to learn from data.",
"contexts": ["Machine learning is a branch of artificial intelligence."],
})
print(result) # {"faithfulness": 0.95}
Async Evaluation
import asyncio
from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import Faithfulness
evaluator = EvaluatorChain(metric=Faithfulness())
async def evaluate():
result = await evaluator.ainvoke({
"question": "What is deep learning?",
"answer": "Deep learning uses neural networks with multiple layers.",
"contexts": ["Deep learning is a subset of machine learning using neural networks."],
})
return result
result = asyncio.run(evaluate())
LangSmith RunEvaluator Usage
from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import Faithfulness
evaluator = EvaluatorChain(metric=Faithfulness())
# Use as a LangSmith evaluator
# The evaluate_run method is called automatically by LangSmith
# when running experiments against a dataset
evaluation_result = evaluator.evaluate_run(run=run, example=example)
print(evaluation_result.score)