Implementation:Explodinggradients Ragas LlamaIndex Integration
Appearance
| Metadata | Value |
|---|---|
| Source | src/ragas/integrations/llama_index.py (Lines 31-204)
|
| Domains | Integration, LlamaIndex |
| Last Updated | 2026-02-10 |
Overview
Provides functions to evaluate LlamaIndex query engines using Ragas metrics and to convert LlamaIndex agent workflow events into Ragas message objects for multi-turn evaluation.
Description
This module contains two functions:
evaluateruns a full Ragas evaluation pipeline against a LlamaIndex query engine. It:- Wraps LlamaIndex LLM and embedding instances into Ragas-compatible wrappers (
LlamaIndexLLMWrapper,LlamaIndexEmbeddingsWrapper). - Validates the provided dataset is an
EvaluationDatasetand checks it is not multi-turn (not yet supported). - Asynchronously queries the engine for each sample using an
Executor, collecting responses and source node texts as retrieved contexts. - Handles failed queries gracefully by logging warnings and inserting
Nonevalues. - Delegates final scoring to
ragas.evaluation.evaluatewith the enriched dataset.
- Wraps LlamaIndex LLM and embedding instances into Ragas-compatible wrappers (
convert_to_ragas_messagesconverts a sequence of LlamaIndex agent workflowEventobjects (AgentInput,AgentOutput,ToolCallResult) into RagasMessageobjects:AgentInputevents with roleUSERbecomeHumanMessage(unless preceded by aToolMessage).AgentOutputevents becomeAIMessagewith optionalToolCallobjects (de-duplicated by tool ID).ToolCallResultevents become eitherToolMessageorAIMessage(whenreturn_directisTrue).
Usage
Use this integration when:
- You have a LlamaIndex query engine and want to evaluate its RAG performance with Ragas metrics.
- You are building agentic workflows with LlamaIndex and need to convert agent events to Ragas message format for multi-turn evaluation.
Code Reference
Source Location
| Item | Detail |
|---|---|
| File | src/ragas/integrations/llama_index.py
|
| Lines | 31-204 |
| Module | ragas.integrations.llama_index
|
Signatures
def evaluate(
query_engine,
dataset: EvaluationDataset,
metrics: list[Metric],
llm: Optional[LlamaindexLLM] = None,
embeddings: Optional[LlamaIndexEmbeddings] = None,
callbacks: Optional[Callbacks] = None,
in_ci: bool = False,
run_config: Optional[RunConfig] = None,
batch_size: Optional[int] = None,
token_usage_parser: Optional[TokenUsageParser] = None,
raise_exceptions: bool = False,
column_map: Optional[Dict[str, str]] = None,
show_progress: bool = True,
) -> EvaluationResult
def convert_to_ragas_messages(events: List[Event]) -> List[Message]
Import
from ragas.integrations.llama_index import evaluate
from ragas.integrations.llama_index import convert_to_ragas_messages
I/O Contract
evaluate
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | query_engine |
LlamaIndex query engine | The query engine to evaluate (must support aquery)
|
| Input | dataset |
EvaluationDataset |
Ragas evaluation dataset with user_input fields
|
| Input | metrics |
list[Metric] |
List of Ragas metrics to compute |
| Input | llm |
Optional[BaseLLM] |
LlamaIndex LLM for metric computation (optional) |
| Input | embeddings |
Optional[BaseEmbedding] |
LlamaIndex embeddings for metrics requiring them (optional) |
| Input | callbacks |
Optional[Callbacks] |
LangChain-compatible callbacks (optional) |
| Input | run_config |
Optional[RunConfig] |
Execution configuration (optional) |
| Input | batch_size |
Optional[int] |
Batch size for query execution (optional) |
| Input | token_usage_parser |
Optional[TokenUsageParser] |
Parser for tracking token usage (optional) |
| Input | raise_exceptions |
bool |
Whether to raise on query failures (default: False)
|
| Input | show_progress |
bool |
Show progress bar (default: True)
|
| Output | (return) | EvaluationResult |
Ragas evaluation result with per-metric scores |
convert_to_ragas_messages
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | events |
List[Event] |
LlamaIndex agent workflow events (AgentInput, AgentOutput, ToolCallResult)
|
| Output | (return) | List[Message] |
Ragas message objects (HumanMessage, AIMessage, ToolMessage)
|
Exceptions
| Exception | Condition |
|---|---|
ValueError |
Dataset is None or not an EvaluationDataset
|
NotImplementedError |
Dataset contains multi-turn samples |
ImportError |
The llama_index package is not installed (for convert_to_ragas_messages)
|
Usage Examples
Evaluating a LlamaIndex Query Engine
from ragas.integrations.llama_index import evaluate
from ragas.metrics import faithfulness, context_precision
from ragas.dataset_schema import EvaluationDataset, SingleTurnSample
# Build your LlamaIndex query engine
# query_engine = index.as_query_engine()
# Create evaluation dataset
dataset = EvaluationDataset(samples=[
SingleTurnSample(
user_input="What is retrieval augmented generation?",
reference="RAG combines retrieval and generation.",
),
])
# Run evaluation
results = evaluate(
query_engine=query_engine,
dataset=dataset,
metrics=[faithfulness, context_precision],
)
print(results)
Converting Agent Events to Ragas Messages
from ragas.integrations.llama_index import convert_to_ragas_messages
# Collect events from a LlamaIndex agent workflow
# events = [agent_input_event, agent_output_event, tool_result_event, ...]
ragas_messages = convert_to_ragas_messages(events)
for msg in ragas_messages:
print(msg.pretty_repr())
Related Pages
- Messages Module - Defines
HumanMessage,AIMessage,ToolMessage, andToolCallused in message conversion - Amazon Bedrock Integration - Similar trace-to-message conversion for Bedrock agents
- EvaluatorChain Class - LangChain-based alternative for running Ragas metrics
- HaystackLLMWrapper Class - Another LLM wrapper pattern for a different framework
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment