Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Explodinggradients Ragas Experiment Decorator

From Leeroopedia


Knowledge Sources Domains Last Updated
explodinggradients/ragas LLM Evaluation, Experiment Management 2026-02-10

Overview

The Experiment Decorator wraps async evaluation functions with dataset iteration, progress tracking, and result persistence to orchestrate full evaluation experiments in the Ragas toolkit.

Description

The @experiment decorator (lines 201-232) creates a factory that returns a decorator wrapping the user function in an ExperimentWrapper (lines 116-198). The wrapper preserves the original function's callable behavior while adding an arun() method that executes the function across all rows of a Dataset. The arun() method creates async tasks for every dataset entry, processes them concurrently using asyncio.as_completed(), tracks progress with tqdm, catches per-row exceptions, and saves the collected results into an Experiment instance (a subclass of DataTable).

Usage

Use the Experiment Decorator when:

  • Running an async evaluation function across an entire evaluation dataset
  • Needing automatic progress bars and error isolation during batch evaluation
  • Persisting experiment results to the same or different backend as the source dataset
  • Creating reproducibly named experiments for later comparison

Code Reference

Source Location: src/ragas/experiment.py, lines 201-232 (@experiment decorator), lines 116-198 (ExperimentWrapper.arun)

Signature:

def experiment(
    experiment_model: Optional[Type[BaseModel]] = None,
    backend: Optional[Union[BaseBackend, str]] = None,
    name_prefix: str = "",
) -> Callable[[Callable], ExperimentProtocol]

The decorated function gains an arun method:

async def arun(
    self,
    dataset: Dataset,
    name: Optional[str] = None,
    backend: Optional[Union[BaseBackend, str]] = None,
    *args,
    **kwargs,
) -> Experiment

Import:

from ragas import experiment

I/O Contract

Inputs (decorator parameters):

Parameter Type Required Description
experiment_model Optional[Type[BaseModel]] No Pydantic model type for experiment result rows
backend Optional[Union[BaseBackend, str]] No Default backend for storing experiment results
name_prefix str No Prefix prepended to auto-generated experiment names

Inputs (arun parameters):

Parameter Type Required Description
dataset Dataset Yes The evaluation dataset to iterate over
name Optional[str] No Custom experiment name (auto-generated if omitted)
backend Optional[Union[BaseBackend, str]] No Override backend for this specific run
*args, **kwargs Any No Additional arguments forwarded to the decorated function

Outputs:

Output Type Description
Experiment result Experiment A DataTable subclass containing all evaluation results, automatically saved to the backend

Usage Examples

Basic experiment with auto-generated name:

import asyncio
from pydantic import BaseModel
from ragas import Dataset, experiment

class EvalResult(BaseModel):
    user_input: str
    response: str
    score: float

@experiment(experiment_model=EvalResult)
async def evaluate_responses(row):
    # Your evaluation logic here
    score = 1.0 if "correct" in row["response"].lower() else 0.0
    return EvalResult(
        user_input=row["user_input"],
        response=row["response"],
        score=score,
    )

# Load dataset and run experiment
dataset = Dataset.load("my_eval_data", "local/csv", root_dir="./data")
results = asyncio.run(evaluate_responses.arun(dataset))

print(f"Experiment: {results.name}")
print(f"Results: {len(results)} rows")

Experiment with custom name and backend:

import asyncio
from ragas import Dataset, experiment

@experiment(name_prefix="qa-eval")
async def evaluate_qa(row, llm=None):
    # Evaluation logic using an LLM
    response = await llm.agenerate(
        f"Is this answer correct? Q: {row['question']} A: {row['answer']}",
        response_model=ScoreModel,
    )
    return {"question": row["question"], "score": response.score}

dataset = Dataset.load("qa_data", "local/csv", root_dir="./data")
results = asyncio.run(
    evaluate_qa.arun(
        dataset,
        name="qa-eval-v2",
        backend="local/jsonl",
        llm=my_llm,
        root_dir="./experiments",
    )
)

Calling the decorated function on a single row (for testing):

import asyncio

# The decorated function is still directly callable
single_result = asyncio.run(evaluate_responses({"user_input": "test", "response": "correct"}))
print(single_result)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment