Implementation:Explodinggradients Ragas Experiment Decorator

Knowledge Sources	Domains	Last Updated
explodinggradients/ragas	LLM Evaluation, Experiment Management	2026-02-10

Overview

The Experiment Decorator wraps async evaluation functions with dataset iteration, progress tracking, and result persistence to orchestrate full evaluation experiments in the Ragas toolkit.

Description

The @experiment decorator (lines 201-232) creates a factory that returns a decorator wrapping the user function in an ExperimentWrapper (lines 116-198). The wrapper preserves the original function's callable behavior while adding an arun() method that executes the function across all rows of a Dataset. The arun() method creates async tasks for every dataset entry, processes them concurrently using asyncio.as_completed(), tracks progress with tqdm, catches per-row exceptions, and saves the collected results into an Experiment instance (a subclass of DataTable).

Usage

Use the Experiment Decorator when:

Running an async evaluation function across an entire evaluation dataset
Needing automatic progress bars and error isolation during batch evaluation
Persisting experiment results to the same or different backend as the source dataset
Creating reproducibly named experiments for later comparison

Code Reference

Source Location: src/ragas/experiment.py, lines 201-232 (@experiment decorator), lines 116-198 (ExperimentWrapper.arun)

Signature:

def experiment(
    experiment_model: Optional[Type[BaseModel]] = None,
    backend: Optional[Union[BaseBackend, str]] = None,
    name_prefix: str = "",
) -> Callable[[Callable], ExperimentProtocol]

The decorated function gains an arun method:

async def arun(
    self,
    dataset: Dataset,
    name: Optional[str] = None,
    backend: Optional[Union[BaseBackend, str]] = None,
    *args,
    **kwargs,
) -> Experiment

Import:

from ragas import experiment

I/O Contract

Inputs (decorator parameters):

Parameter	Type	Required	Description
`experiment_model`	`Optional[Type[BaseModel]]`	No	Pydantic model type for experiment result rows
`backend`	`Optional[Union[BaseBackend, str]]`	No	Default backend for storing experiment results
`name_prefix`	`str`	No	Prefix prepended to auto-generated experiment names

Inputs (arun parameters):

Parameter	Type	Required	Description
`dataset`	`Dataset`	Yes	The evaluation dataset to iterate over
`name`	`Optional[str]`	No	Custom experiment name (auto-generated if omitted)
`backend`	`Optional[Union[BaseBackend, str]]`	No	Override backend for this specific run
`args, *kwargs`	`Any`	No	Additional arguments forwarded to the decorated function

Outputs:

Output	Type	Description
Experiment result	`Experiment`	A `DataTable` subclass containing all evaluation results, automatically saved to the backend

Usage Examples

Basic experiment with auto-generated name:

import asyncio
from pydantic import BaseModel
from ragas import Dataset, experiment

class EvalResult(BaseModel):
    user_input: str
    response: str
    score: float

@experiment(experiment_model=EvalResult)
async def evaluate_responses(row):
    # Your evaluation logic here
    score = 1.0 if "correct" in row["response"].lower() else 0.0
    return EvalResult(
        user_input=row["user_input"],
        response=row["response"],
        score=score,
    )

# Load dataset and run experiment
dataset = Dataset.load("my_eval_data", "local/csv", root_dir="./data")
results = asyncio.run(evaluate_responses.arun(dataset))

print(f"Experiment: {results.name}")
print(f"Results: {len(results)} rows")

Experiment with custom name and backend:

import asyncio
from ragas import Dataset, experiment

@experiment(name_prefix="qa-eval")
async def evaluate_qa(row, llm=None):
    # Evaluation logic using an LLM
    response = await llm.agenerate(
        f"Is this answer correct? Q: {row['question']} A: {row['answer']}",
        response_model=ScoreModel,
    )
    return {"question": row["question"], "score": response.score}

dataset = Dataset.load("qa_data", "local/csv", root_dir="./data")
results = asyncio.run(
    evaluate_qa.arun(
        dataset,
        name="qa-eval-v2",
        backend="local/jsonl",
        llm=my_llm,
        root_dir="./experiments",
    )
)

Calling the decorated function on a single row (for testing):

import asyncio

# The decorated function is still directly callable
single_result = asyncio.run(evaluate_responses({"user_input": "test", "response": "correct"}))
print(single_result)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment