Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Explodinggradients Ragas LlamaIndex Integration

From Leeroopedia


Metadata Value
Source src/ragas/integrations/llama_index.py (Lines 31-204)
Domains Integration, LlamaIndex
Last Updated 2026-02-10

Overview

Provides functions to evaluate LlamaIndex query engines using Ragas metrics and to convert LlamaIndex agent workflow events into Ragas message objects for multi-turn evaluation.

Description

This module contains two functions:

  • evaluate runs a full Ragas evaluation pipeline against a LlamaIndex query engine. It:
    • Wraps LlamaIndex LLM and embedding instances into Ragas-compatible wrappers (LlamaIndexLLMWrapper, LlamaIndexEmbeddingsWrapper).
    • Validates the provided dataset is an EvaluationDataset and checks it is not multi-turn (not yet supported).
    • Asynchronously queries the engine for each sample using an Executor, collecting responses and source node texts as retrieved contexts.
    • Handles failed queries gracefully by logging warnings and inserting None values.
    • Delegates final scoring to ragas.evaluation.evaluate with the enriched dataset.
  • convert_to_ragas_messages converts a sequence of LlamaIndex agent workflow Event objects (AgentInput, AgentOutput, ToolCallResult) into Ragas Message objects:
    • AgentInput events with role USER become HumanMessage (unless preceded by a ToolMessage).
    • AgentOutput events become AIMessage with optional ToolCall objects (de-duplicated by tool ID).
    • ToolCallResult events become either ToolMessage or AIMessage (when return_direct is True).

Usage

Use this integration when:

  • You have a LlamaIndex query engine and want to evaluate its RAG performance with Ragas metrics.
  • You are building agentic workflows with LlamaIndex and need to convert agent events to Ragas message format for multi-turn evaluation.

Code Reference

Source Location

Item Detail
File src/ragas/integrations/llama_index.py
Lines 31-204
Module ragas.integrations.llama_index

Signatures

def evaluate(
    query_engine,
    dataset: EvaluationDataset,
    metrics: list[Metric],
    llm: Optional[LlamaindexLLM] = None,
    embeddings: Optional[LlamaIndexEmbeddings] = None,
    callbacks: Optional[Callbacks] = None,
    in_ci: bool = False,
    run_config: Optional[RunConfig] = None,
    batch_size: Optional[int] = None,
    token_usage_parser: Optional[TokenUsageParser] = None,
    raise_exceptions: bool = False,
    column_map: Optional[Dict[str, str]] = None,
    show_progress: bool = True,
) -> EvaluationResult

def convert_to_ragas_messages(events: List[Event]) -> List[Message]

Import

from ragas.integrations.llama_index import evaluate
from ragas.integrations.llama_index import convert_to_ragas_messages

I/O Contract

evaluate

Direction Name Type Description
Input query_engine LlamaIndex query engine The query engine to evaluate (must support aquery)
Input dataset EvaluationDataset Ragas evaluation dataset with user_input fields
Input metrics list[Metric] List of Ragas metrics to compute
Input llm Optional[BaseLLM] LlamaIndex LLM for metric computation (optional)
Input embeddings Optional[BaseEmbedding] LlamaIndex embeddings for metrics requiring them (optional)
Input callbacks Optional[Callbacks] LangChain-compatible callbacks (optional)
Input run_config Optional[RunConfig] Execution configuration (optional)
Input batch_size Optional[int] Batch size for query execution (optional)
Input token_usage_parser Optional[TokenUsageParser] Parser for tracking token usage (optional)
Input raise_exceptions bool Whether to raise on query failures (default: False)
Input show_progress bool Show progress bar (default: True)
Output (return) EvaluationResult Ragas evaluation result with per-metric scores

convert_to_ragas_messages

Direction Name Type Description
Input events List[Event] LlamaIndex agent workflow events (AgentInput, AgentOutput, ToolCallResult)
Output (return) List[Message] Ragas message objects (HumanMessage, AIMessage, ToolMessage)

Exceptions

Exception Condition
ValueError Dataset is None or not an EvaluationDataset
NotImplementedError Dataset contains multi-turn samples
ImportError The llama_index package is not installed (for convert_to_ragas_messages)

Usage Examples

Evaluating a LlamaIndex Query Engine

from ragas.integrations.llama_index import evaluate
from ragas.metrics import faithfulness, context_precision
from ragas.dataset_schema import EvaluationDataset, SingleTurnSample

# Build your LlamaIndex query engine
# query_engine = index.as_query_engine()

# Create evaluation dataset
dataset = EvaluationDataset(samples=[
    SingleTurnSample(
        user_input="What is retrieval augmented generation?",
        reference="RAG combines retrieval and generation.",
    ),
])

# Run evaluation
results = evaluate(
    query_engine=query_engine,
    dataset=dataset,
    metrics=[faithfulness, context_precision],
)

print(results)

Converting Agent Events to Ragas Messages

from ragas.integrations.llama_index import convert_to_ragas_messages

# Collect events from a LlamaIndex agent workflow
# events = [agent_input_event, agent_output_event, tool_result_event, ...]

ragas_messages = convert_to_ragas_messages(events)

for msg in ragas_messages:
    print(msg.pretty_repr())

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment