Implementation:Explodinggradients Ragas Experiment Decorator
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| explodinggradients/ragas | LLM Evaluation, Experiment Management | 2026-02-10 |
Overview
The Experiment Decorator wraps async evaluation functions with dataset iteration, progress tracking, and result persistence to orchestrate full evaluation experiments in the Ragas toolkit.
Description
The @experiment decorator (lines 201-232) creates a factory that returns a decorator wrapping the user function in an ExperimentWrapper (lines 116-198). The wrapper preserves the original function's callable behavior while adding an arun() method that executes the function across all rows of a Dataset. The arun() method creates async tasks for every dataset entry, processes them concurrently using asyncio.as_completed(), tracks progress with tqdm, catches per-row exceptions, and saves the collected results into an Experiment instance (a subclass of DataTable).
Usage
Use the Experiment Decorator when:
- Running an async evaluation function across an entire evaluation dataset
- Needing automatic progress bars and error isolation during batch evaluation
- Persisting experiment results to the same or different backend as the source dataset
- Creating reproducibly named experiments for later comparison
Code Reference
Source Location: src/ragas/experiment.py, lines 201-232 (@experiment decorator), lines 116-198 (ExperimentWrapper.arun)
Signature:
def experiment(
experiment_model: Optional[Type[BaseModel]] = None,
backend: Optional[Union[BaseBackend, str]] = None,
name_prefix: str = "",
) -> Callable[[Callable], ExperimentProtocol]
The decorated function gains an arun method:
async def arun(
self,
dataset: Dataset,
name: Optional[str] = None,
backend: Optional[Union[BaseBackend, str]] = None,
*args,
**kwargs,
) -> Experiment
Import:
from ragas import experiment
I/O Contract
Inputs (decorator parameters):
| Parameter | Type | Required | Description |
|---|---|---|---|
experiment_model |
Optional[Type[BaseModel]] |
No | Pydantic model type for experiment result rows |
backend |
Optional[Union[BaseBackend, str]] |
No | Default backend for storing experiment results |
name_prefix |
str |
No | Prefix prepended to auto-generated experiment names |
Inputs (arun parameters):
| Parameter | Type | Required | Description |
|---|---|---|---|
dataset |
Dataset |
Yes | The evaluation dataset to iterate over |
name |
Optional[str] |
No | Custom experiment name (auto-generated if omitted) |
backend |
Optional[Union[BaseBackend, str]] |
No | Override backend for this specific run |
*args, **kwargs |
Any |
No | Additional arguments forwarded to the decorated function |
Outputs:
| Output | Type | Description |
|---|---|---|
| Experiment result | Experiment |
A DataTable subclass containing all evaluation results, automatically saved to the backend
|
Usage Examples
Basic experiment with auto-generated name:
import asyncio
from pydantic import BaseModel
from ragas import Dataset, experiment
class EvalResult(BaseModel):
user_input: str
response: str
score: float
@experiment(experiment_model=EvalResult)
async def evaluate_responses(row):
# Your evaluation logic here
score = 1.0 if "correct" in row["response"].lower() else 0.0
return EvalResult(
user_input=row["user_input"],
response=row["response"],
score=score,
)
# Load dataset and run experiment
dataset = Dataset.load("my_eval_data", "local/csv", root_dir="./data")
results = asyncio.run(evaluate_responses.arun(dataset))
print(f"Experiment: {results.name}")
print(f"Results: {len(results)} rows")
Experiment with custom name and backend:
import asyncio
from ragas import Dataset, experiment
@experiment(name_prefix="qa-eval")
async def evaluate_qa(row, llm=None):
# Evaluation logic using an LLM
response = await llm.agenerate(
f"Is this answer correct? Q: {row['question']} A: {row['answer']}",
response_model=ScoreModel,
)
return {"question": row["question"], "score": response.score}
dataset = Dataset.load("qa_data", "local/csv", root_dir="./data")
results = asyncio.run(
evaluate_qa.arun(
dataset,
name="qa-eval-v2",
backend="local/jsonl",
llm=my_llm,
root_dir="./experiments",
)
)
Calling the decorated function on a single row (for testing):
import asyncio
# The decorated function is still directly callable
single_result = asyncio.run(evaluate_responses({"user_input": "test", "response": "correct"}))
print(single_result)
Related Pages
- Principle:Explodinggradients_Ragas_Experiment_Orchestration
- Environment:Explodinggradients_Ragas_Python_Runtime_Environment
- Heuristic:Explodinggradients_Ragas_Retry_And_Backoff_Configuration
- Heuristic:Explodinggradients_Ragas_Concurrency_And_Rate_Limiting
- Heuristic:Explodinggradients_Ragas_Failed_Metrics_Return_NaN