Implementation:Explodinggradients Ragas DataTable To Pandas
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| explodinggradients/ragas | Data Analysis, Experiment Management | 2026-02-10 |
Overview
The DataTable To Pandas method converts DataTable, Dataset, and Experiment instances into pandas DataFrames for tabular analysis, aggregation, and visualization.
Description
The to_pandas() method (lines 316-335 of src/ragas/dataset.py) is defined on the DataTable base class and inherited by both Dataset and Experiment. It iterates over the internal data list, converting Pydantic BaseModel instances to dictionaries via model_dump() and passing plain dictionaries through unchanged. The resulting list of dictionaries is fed to pd.DataFrame() to produce a standard pandas DataFrame. The method imports pandas lazily, raising a clear ImportError if pandas is not installed. A complementary from_pandas() class method (lines 201-252) enables the reverse conversion, creating a DataTable from a pandas DataFrame.
Usage
Use the DataTable To Pandas method when:
- Analyzing experiment results with pandas operations (describe, groupby, merge)
- Visualizing evaluation scores with matplotlib or seaborn
- Exporting results to CSV or Excel
- Computing aggregate statistics across evaluation runs
- Comparing results from multiple experiments by joining DataFrames
Code Reference
Source Location: src/ragas/dataset.py, lines 316-335 (to_pandas), lines 201-252 (from_pandas)
Signature:
def to_pandas(self) -> "PandasDataFrame":
"""Convert the dataset to a pandas DataFrame."""
Complementary class method:
@classmethod
def from_pandas(
cls,
dataframe: "PandasDataFrame",
name: str,
backend: Union[BaseBackend, str],
data_model: Optional[Type[T]] = None,
**kwargs,
) -> Self:
"""Create a DataTable from a pandas DataFrame."""
Import:
from ragas import Dataset
# or
from ragas import Experiment
# Both inherit to_pandas() from DataTable
I/O Contract
Inputs (to_pandas):
| Parameter | Type | Required | Description |
|---|---|---|---|
self |
DataTable[T] (or Dataset / Experiment) |
Yes | The DataTable instance containing evaluation data (Pydantic models or dictionaries) |
Outputs (to_pandas):
| Output | Type | Description |
|---|---|---|
| DataFrame | pandas.DataFrame |
Tabular representation where each row is a dataset entry and each column is a data field |
Raises:
| Exception | Condition |
|---|---|
ImportError |
pandas is not installed |
TypeError |
Dataset contains entries that are neither BaseModel instances nor dictionaries
|
Inputs (from_pandas):
| Parameter | Type | Required | Description |
|---|---|---|---|
dataframe |
pandas.DataFrame |
Yes | The DataFrame to convert |
name |
str |
Yes | Name for the resulting DataTable |
backend |
Union[BaseBackend, str] |
Yes | Backend for storage |
data_model |
Optional[Type[T]] |
No | Optional Pydantic model for validation |
**kwargs |
Any |
No | Additional backend configuration |
Usage Examples
Basic conversion to DataFrame:
from ragas import Dataset
# Load evaluation results
dataset = Dataset.load("eval_results", "local/csv", root_dir="./data")
# Convert to pandas DataFrame
df = dataset.to_pandas()
print(df.head())
print(df.describe())
Analyzing experiment results:
from ragas import Experiment
# Load experiment results (Experiment inherits to_pandas from DataTable)
experiment = Experiment.load("qa-eval-v1", "local/jsonl", root_dir="./experiments")
# Convert and analyze
df = experiment.to_pandas()
# Summary statistics
print(f"Mean score: {df['score'].mean():.3f}")
print(f"Pass rate: {(df['verdict'] == 'pass').mean():.1%}")
# Group by category
print(df.groupby("category")["score"].mean())
Comparing two experiments:
from ragas import Experiment
exp_v1 = Experiment.load("eval-v1", "local/csv", root_dir="./experiments")
exp_v2 = Experiment.load("eval-v2", "local/csv", root_dir="./experiments")
df_v1 = exp_v1.to_pandas()
df_v2 = exp_v2.to_pandas()
# Compare mean scores
print(f"V1 mean: {df_v1['score'].mean():.3f}")
print(f"V2 mean: {df_v2['score'].mean():.3f}")
# Merge for side-by-side comparison
comparison = df_v1[["user_input", "score"]].merge(
df_v2[["user_input", "score"]],
on="user_input",
suffixes=("_v1", "_v2"),
)
print(comparison.head())
Round-trip: DataFrame to Dataset and back:
import pandas as pd
from ragas import Dataset
# Create a DataFrame
df = pd.DataFrame({
"user_input": ["What is AI?", "What is ML?"],
"response": ["Artificial Intelligence", "Machine Learning"],
"reference": ["AI is a broad field", "ML is a subset of AI"],
})
# Convert to Dataset
dataset = Dataset.from_pandas(df, "from_df", "local/csv", root_dir="./data")
dataset.save()
# Convert back to DataFrame
df_roundtrip = dataset.to_pandas()
print(df_roundtrip)
Exporting to CSV:
from ragas import Experiment
experiment = Experiment.load("final-eval", "local/jsonl", root_dir="./experiments")
df = experiment.to_pandas()
# Export for reporting
df.to_csv("evaluation_report.csv", index=False)
df.to_excel("evaluation_report.xlsx", index=False)