Implementation:Explodinggradients Ragas EvaluatorChain Class
Appearance
| Metadata | Value |
|---|---|
| Source | src/ragas/integrations/langchain.py (Lines 32-206)
|
| Domains | Integration, LangChain |
| Last Updated | 2026-02-10 |
Overview
Wraps any Ragas SingleTurnMetric as both a LangChain Chain and a LangSmith RunEvaluator, enabling seamless use of Ragas metrics within LangChain workflows and LangSmith evaluation runs.
Description
EvaluatorChain inherits from both langchain.chains.base.Chain and langsmith.evaluation.RunEvaluator. On initialization, it:
- Accepts a Ragas
Metricinstance and validates it is aSingleTurnMetric. - Automatically wraps LLM and embedding dependencies: if the metric is a
MetricWithLLM, it initializes aChatOpenAI(or uses a provided one) wrapped inLangchainLLMWrapper. Similarly forMetricWithEmbeddingswithOpenAIEmbeddings. - Calls
metric.init(run_config)to prepare the metric for execution.
The class provides:
_calland_acall: Synchronous and asynchronous chain execution. Inputs can be a dictionary (automatically converted from v1 to v2 format) or aSingleTurnSample. LangChainDocumentobjects inretrieved_contextsare automatically converted to strings viapage_content.
_validate: Checks that the input sample contains all columns required by the metric.
evaluate_run: Implements theRunEvaluatorinterface for LangSmith. It validates that theRunandExamplecontain the expected keys (question,ground_truth,answer,contexts), then invokes the chain and returns anEvaluationResult.
input_keysandoutput_keysproperties dynamically derive required columns from the wrapped metric.
Usage
Use EvaluatorChain when you want to:
- Run Ragas metrics as part of a LangChain pipeline.
- Use Ragas metrics as custom evaluators in LangSmith evaluation runs.
- Integrate Ragas scoring into existing LangChain-based applications.
Code Reference
Source Location
| Item | Detail |
|---|---|
| File | src/ragas/integrations/langchain.py
|
| Lines | 32-206 |
| Module | ragas.integrations.langchain
|
Class Signature
class EvaluatorChain(Chain, RunEvaluator):
metric: Metric
def __init__(self, metric: Metric, **kwargs: Any) -> None: ...
def _call(self, inputs: Union[Dict[str, Any], SingleTurnSample], run_manager=None) -> Dict[str, Any]: ...
async def _acall(self, inputs: Union[Dict[str, Any], SingleTurnSample], run_manager=None) -> Dict[str, Any]: ...
def evaluate_run(self, run: Run, example: Optional[Example] = None) -> EvaluationResult: ...
Import
from ragas.integrations.langchain import EvaluatorChain
I/O Contract
_call / _acall
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | inputs |
Union[Dict[str, Any], SingleTurnSample] |
Evaluation data; dicts are auto-converted to SingleTurnSample
|
| Input | run_manager |
Optional[CallbackManagerForChainRun] |
LangChain callback manager (optional) |
| Output | (return) | Dict[str, Any] |
Dictionary with metric name as key and score as value (e.g., {"faithfulness": 0.85})
|
evaluate_run
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | run |
Run |
LangSmith run containing chain outputs (answer, contexts)
|
| Input | example |
Optional[Example] |
LangSmith example containing inputs (question) and outputs (ground_truth)
|
| Output | (return) | EvaluationResult |
LangSmith evaluation result with metric name and score |
Constructor Parameters
| Name | Type | Required | Description |
|---|---|---|---|
metric |
Metric |
Yes | A Ragas SingleTurnMetric instance
|
llm |
ChatOpenAI |
No | LLM for metrics requiring one (defaults to ChatOpenAI())
|
embeddings |
OpenAIEmbeddings |
No | Embeddings for metrics requiring them (defaults to OpenAIEmbeddings())
|
run_config |
RunConfig |
No | Execution configuration (defaults to RunConfig())
|
Usage Examples
Using as a LangChain Chain
from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import faithfulness
# Create an evaluator chain
evaluator = EvaluatorChain(metric=faithfulness)
# Run evaluation with a dictionary input
result = evaluator.invoke({
"question": "What is retrieval augmented generation?",
"answer": "RAG combines retrieval with generation.",
"contexts": ["RAG is a technique that combines information retrieval with text generation."],
})
print(result["faithfulness"]) # e.g., 0.95
Using as a LangSmith RunEvaluator
from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import context_precision
# Create evaluator for use with LangSmith
evaluator = EvaluatorChain(metric=context_precision)
# Used automatically by LangSmith during evaluation runs
# The evaluate_run method is called by LangSmith's evaluation framework
Async Evaluation
import asyncio
from ragas.integrations.langchain import EvaluatorChain
from ragas.metrics import faithfulness
evaluator = EvaluatorChain(metric=faithfulness)
result = asyncio.run(evaluator.ainvoke({
"question": "What is Ragas?",
"answer": "Ragas is an evaluation toolkit.",
"contexts": ["Ragas provides metrics for evaluating LLM applications."],
}))
Related Pages
- LangSmith Integration - Uses
EvaluatorChaininternally for running evaluations on LangSmith datasets - LlamaIndex Integration - Alternative integration for LlamaIndex query engines
- Messages Module - Core message types used in evaluation samples
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment