Implementation:Run llama Llama index BatchEvalRunner Init
Overview
BatchEvalRunner_Init documents the initialization of the BatchEvalRunner class, which orchestrates parallel execution of multiple evaluators across sets of queries. The runner accepts a dictionary of named evaluators, a worker count for concurrency control, and an optional progress indicator.
Principle:Run_llama_Llama_index_Batch_Evaluation_Setup
RAG Evaluation Batch Processing LlamaIndex API
Source File
llama-index-core/llama_index/core/evaluation/batch_runner.py, Lines 75–96
Import Statement
from llama_index.core.evaluation import BatchEvalRunner
Constructor: __init__
| Parameter | Type | Default | Description |
|---|---|---|---|
| evaluators | Dict[str, BaseEvaluator] |
required | Dictionary mapping evaluator names to initialized evaluator instances |
| workers | int |
2 |
Number of concurrent workers for parallel evaluation execution |
| show_progress | bool |
False |
Whether to display a progress bar during batch evaluation |
Basic Initialization Example
from llama_index.core.evaluation import (
BatchEvalRunner,
FaithfulnessEvaluator,
RelevancyEvaluator,
CorrectnessEvaluator,
)
from llama_index.llms.openai import OpenAI
# Configure judge LLM
judge_llm = OpenAI(model="gpt-4", temperature=0.0)
# Initialize evaluators
faithfulness_eval = FaithfulnessEvaluator(llm=judge_llm)
relevancy_eval = RelevancyEvaluator(llm=judge_llm)
correctness_eval = CorrectnessEvaluator(llm=judge_llm, score_threshold=4.0)
# Create batch runner with named evaluators
runner = BatchEvalRunner(
evaluators={
"faithfulness": faithfulness_eval,
"relevancy": relevancy_eval,
"correctness": correctness_eval,
},
workers=2,
show_progress=True,
)
Minimal Setup: Single Evaluator
from llama_index.core.evaluation import BatchEvalRunner, FaithfulnessEvaluator
from llama_index.llms.openai import OpenAI
# Single-evaluator batch runner for focused evaluation
runner = BatchEvalRunner(
evaluators={
"faithfulness": FaithfulnessEvaluator(
llm=OpenAI(model="gpt-4", temperature=0.0)
),
},
workers=4, # Higher parallelism for single evaluator
show_progress=True,
)
Worker Count Tuning
from llama_index.core.evaluation import BatchEvalRunner
# Conservative setup for rate-limited APIs
conservative_runner = BatchEvalRunner(
evaluators=evaluators,
workers=1, # Serial execution, safest for strict rate limits
show_progress=True,
)
# Balanced setup for standard API tiers
balanced_runner = BatchEvalRunner(
evaluators=evaluators,
workers=2, # Default, good balance of speed and safety
show_progress=True,
)
# Aggressive setup for high-throughput API keys
fast_runner = BatchEvalRunner(
evaluators=evaluators,
workers=8, # High parallelism, requires generous rate limits
show_progress=True,
)
Evaluator Dictionary Structure
The evaluators dictionary is the core configuration surface. The string keys serve dual purposes:
- Execution identification — the runner uses keys to track which evaluator produced which results
- Result access — output dictionaries use the same keys, enabling
results["faithfulness"]access patterns
# The evaluator names become keys in the results dictionary
evaluators = {
"faithfulness": faithfulness_eval, # results["faithfulness"]
"relevancy": relevancy_eval, # results["relevancy"]
"correctness": correctness_eval, # results["correctness"]
}
runner = BatchEvalRunner(evaluators=evaluators, workers=2)
# After evaluation:
# results = await runner.aevaluate_queries(...)
# results["faithfulness"] -> List[EvaluationResult]
# results["relevancy"] -> List[EvaluationResult]
# results["correctness"] -> List[EvaluationResult]
Full Pipeline Setup Example
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.evaluation import (
BatchEvalRunner,
FaithfulnessEvaluator,
RelevancyEvaluator,
CorrectnessEvaluator,
)
from llama_index.llms.openai import OpenAI
# Step 1: Build the RAG pipeline
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
# Step 2: Configure evaluation
judge_llm = OpenAI(model="gpt-4", temperature=0.0)
runner = BatchEvalRunner(
evaluators={
"faithfulness": FaithfulnessEvaluator(llm=judge_llm),
"relevancy": RelevancyEvaluator(llm=judge_llm),
"correctness": CorrectnessEvaluator(
llm=judge_llm, score_threshold=4.0
),
},
workers=2,
show_progress=True,
)
# Step 3: Runner is now ready for evaluate_queries() or evaluate_responses()
Knowledge Sources
LlamaIndex Evaluation LlamaIndex BatchEvalRunner
Environment:Run_llama_Llama_index_Python_LlamaIndex_Core Heuristic:Run_llama_Llama_index_Worker_Count_Configuration Heuristic:Run_llama_Llama_index_Batch_Eval_Retry_Strategy