Implementation:Run llama Llama index BatchEvalRunner Init

Overview

BatchEvalRunner_Init documents the initialization of the BatchEvalRunner class, which orchestrates parallel execution of multiple evaluators across sets of queries. The runner accepts a dictionary of named evaluators, a worker count for concurrency control, and an optional progress indicator.

Principle:Run_llama_Llama_index_Batch_Evaluation_Setup

RAG Evaluation Batch Processing LlamaIndex API

Source File

llama-index-core/llama_index/core/evaluation/batch_runner.py, Lines 75–96

Import Statement

from llama_index.core.evaluation import BatchEvalRunner

Constructor: init

Parameter	Type	Default	Description
evaluators	`Dict[str, BaseEvaluator]`	required	Dictionary mapping evaluator names to initialized evaluator instances
workers	`int`	`2`	Number of concurrent workers for parallel evaluation execution
show_progress	`bool`	`False`	Whether to display a progress bar during batch evaluation

Basic Initialization Example

from llama_index.core.evaluation import (
    BatchEvalRunner,
    FaithfulnessEvaluator,
    RelevancyEvaluator,
    CorrectnessEvaluator,
)
from llama_index.llms.openai import OpenAI

# Configure judge LLM
judge_llm = OpenAI(model="gpt-4", temperature=0.0)

# Initialize evaluators
faithfulness_eval = FaithfulnessEvaluator(llm=judge_llm)
relevancy_eval = RelevancyEvaluator(llm=judge_llm)
correctness_eval = CorrectnessEvaluator(llm=judge_llm, score_threshold=4.0)

# Create batch runner with named evaluators
runner = BatchEvalRunner(
    evaluators={
        "faithfulness": faithfulness_eval,
        "relevancy": relevancy_eval,
        "correctness": correctness_eval,
    },
    workers=2,
    show_progress=True,
)

Minimal Setup: Single Evaluator

from llama_index.core.evaluation import BatchEvalRunner, FaithfulnessEvaluator
from llama_index.llms.openai import OpenAI

# Single-evaluator batch runner for focused evaluation
runner = BatchEvalRunner(
    evaluators={
        "faithfulness": FaithfulnessEvaluator(
            llm=OpenAI(model="gpt-4", temperature=0.0)
        ),
    },
    workers=4,  # Higher parallelism for single evaluator
    show_progress=True,
)

Worker Count Tuning

from llama_index.core.evaluation import BatchEvalRunner

# Conservative setup for rate-limited APIs
conservative_runner = BatchEvalRunner(
    evaluators=evaluators,
    workers=1,  # Serial execution, safest for strict rate limits
    show_progress=True,
)

# Balanced setup for standard API tiers
balanced_runner = BatchEvalRunner(
    evaluators=evaluators,
    workers=2,  # Default, good balance of speed and safety
    show_progress=True,
)

# Aggressive setup for high-throughput API keys
fast_runner = BatchEvalRunner(
    evaluators=evaluators,
    workers=8,  # High parallelism, requires generous rate limits
    show_progress=True,
)

Evaluator Dictionary Structure

The evaluators dictionary is the core configuration surface. The string keys serve dual purposes:

Execution identification — the runner uses keys to track which evaluator produced which results
Result access — output dictionaries use the same keys, enabling results["faithfulness"] access patterns

# The evaluator names become keys in the results dictionary
evaluators = {
    "faithfulness": faithfulness_eval,   # results["faithfulness"]
    "relevancy": relevancy_eval,         # results["relevancy"]
    "correctness": correctness_eval,     # results["correctness"]
}

runner = BatchEvalRunner(evaluators=evaluators, workers=2)

# After evaluation:
# results = await runner.aevaluate_queries(...)
# results["faithfulness"]  -> List[EvaluationResult]
# results["relevancy"]     -> List[EvaluationResult]
# results["correctness"]   -> List[EvaluationResult]

Full Pipeline Setup Example

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.evaluation import (
    BatchEvalRunner,
    FaithfulnessEvaluator,
    RelevancyEvaluator,
    CorrectnessEvaluator,
)
from llama_index.llms.openai import OpenAI

# Step 1: Build the RAG pipeline
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Step 2: Configure evaluation
judge_llm = OpenAI(model="gpt-4", temperature=0.0)

runner = BatchEvalRunner(
    evaluators={
        "faithfulness": FaithfulnessEvaluator(llm=judge_llm),
        "relevancy": RelevancyEvaluator(llm=judge_llm),
        "correctness": CorrectnessEvaluator(
            llm=judge_llm, score_threshold=4.0
        ),
    },
    workers=2,
    show_progress=True,
)

# Step 3: Runner is now ready for evaluate_queries() or evaluate_responses()

Knowledge Sources

LlamaIndex Evaluation LlamaIndex BatchEvalRunner

Environment:Run_llama_Llama_index_Python_LlamaIndex_Core Heuristic:Run_llama_Llama_index_Worker_Count_Configuration Heuristic:Run_llama_Llama_index_Batch_Eval_Retry_Strategy

2026-02-11 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment