Implementation:Explodinggradients Ragas Testset Export Methods

Knowledge Sources	Domains	Last Updated
explodinggradients/ragas	LLM Evaluation, Test Data Generation, Data Export	2026-02-10

Overview

Description

The Testset Export Methods are the collection of methods on the Testset class (and its base class RagasDataset) that convert generated test data into different output formats. The primary methods are to_evaluation_dataset(), to_list(), and to_pandas(). Additionally, the KnowledgeGraph.save() method enables persisting the underlying graph for reuse.

Usage

These methods are called after test generation completes. The Testset object returned by TestsetGenerator.generate() exposes all export methods. The choice of method depends on the downstream consumer.

Code Reference

Source Location

Component	File	Lines
`Testset` (dataclass)	`src/ragas/testset/synthesizers/testset_schema.py`	L46-152
`Testset.to_evaluation_dataset`	`src/ragas/testset/synthesizers/testset_schema.py`	L61-67
`Testset.to_list`	`src/ragas/testset/synthesizers/testset_schema.py`	L69-78
`Testset.from_list`	`src/ragas/testset/synthesizers/testset_schema.py`	L80-108
`RagasDataset.to_pandas`	`src/ragas/dataset_schema.py`	L238-248
`RagasDataset.to_hf_dataset`	`src/ragas/dataset_schema.py`	L222-231
`KnowledgeGraph.save`	`src/ragas/testset/graph.py`	L183-204

Signature

@dataclass
class Testset(RagasDataset[TestsetSample]):
    samples: List[TestsetSample]
    run_id: str = field(default_factory=lambda: str(uuid4()))
    cost_cb: Optional[CostCallbackHandler] = field(default=None)

    def to_evaluation_dataset(self) -> EvaluationDataset:
        ...

    def to_list(self) -> List[Dict]:
        ...

    def to_pandas(self) -> DataFrame:  # inherited from RagasDataset
        ...

    def to_hf_dataset(self) -> HFDataset:  # inherited from RagasDataset
        ...

    @classmethod
    def from_list(cls, data: List[Dict]) -> Testset:
        ...

    @classmethod
    def from_annotated(cls, path: str) -> Testset:
        ...

    def total_tokens(self) -> Union[List[TokenUsage], TokenUsage]:
        ...

    def total_cost(
        self,
        cost_per_input_token: Optional[float] = None,
        cost_per_output_token: Optional[float] = None,
    ) -> float:
        ...

Import

from ragas.testset import Testset
# or
from ragas.testset.synthesizers.testset_schema import Testset, TestsetSample

I/O Contract

to_evaluation_dataset()

Direction	Type	Description
Input	(none)	Operates on the `Testset` instance's `samples` list
Output	`EvaluationDataset`	Contains only the `eval_sample` field from each `TestsetSample`; strips `synthesizer_name`

to_list()

Direction	Type	Description
Input	(none)	Operates on the `Testset` instance's `samples` list
Output	`List[Dict]`	Each dictionary contains all non-None fields from `eval_sample` plus `synthesizer_name`

to_pandas()

Direction	Type	Description
Input	(none)	Calls `to_list()` internally
Output	`pandas.DataFrame`	Tabular representation with one row per sample; requires `pandas` to be installed

from_list()

Direction	Type	Description
Input	`List[Dict]`	Each dict must contain `synthesizer_name` and fields for either `SingleTurnSample` or `MultiTurnSample`
Output	`Testset`	Reconstructed testset with proper sample typing based on `user_input` format

KnowledgeGraph.save()

Direction	Type	Description
Input	`Union[str, Path]`	File system path for the output JSON file
Output	JSON file	UTF-8 encoded JSON with `nodes` and `relationships` arrays

Usage Examples

Converting to EvaluationDataset

from ragas.testset import TestsetGenerator
from ragas import evaluate
from ragas.metrics import faithfulness, answer_correctness

# Assume testset has been generated
# testset = generator.generate_with_langchain_docs(documents, testset_size=20)

eval_dataset = testset.to_evaluation_dataset()
print(f"Evaluation dataset with {len(eval_dataset.samples)} samples")

# Use directly with Ragas evaluate
result = evaluate(
    dataset=eval_dataset,
    metrics=[faithfulness, answer_correctness],
)

Converting to Pandas DataFrame

df = testset.to_pandas()
print(df.columns.tolist())
# ['user_input', 'retrieved_contexts', 'response', 'reference', 'synthesizer_name', ...]

# Filter by synthesizer type
single_hop = df[df["synthesizer_name"] == "single_hop_specific_query_synthesizer"]
print(f"Single-hop samples: {len(single_hop)}")

# Inspect a sample
print(df.iloc[0]["user_input"])

Round-Trip Serialization

from ragas.testset.synthesizers.testset_schema import Testset

# Export to list of dicts
data = testset.to_list()

# Save as JSON
import json
with open("testset.json", "w") as f:
    json.dump(data, f, indent=2)

# Reload
with open("testset.json", "r") as f:
    loaded_data = json.load(f)

testset_restored = Testset.from_list(loaded_data)
print(f"Restored {len(testset_restored.samples)} samples")

Converting to Hugging Face Dataset

hf_dataset = testset.to_hf_dataset()
print(hf_dataset)
# Dataset({
#     features: ['user_input', 'response', 'reference', 'synthesizer_name', ...],
#     num_rows: 20
# })

# Push to Hub
hf_dataset.push_to_hub("my-org/my-testset")

Saving the Knowledge Graph for Reuse

# After generation, save the enriched graph
generator.knowledge_graph.save("enriched_kg.json")

# Later, reload and generate a new test set without reprocessing documents
from ragas.testset.graph import KnowledgeGraph
from ragas.testset import TestsetGenerator
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

kg = KnowledgeGraph.load("enriched_kg.json")
generator = TestsetGenerator.from_langchain(
    llm=ChatOpenAI(model="gpt-4o"),
    embedding_model=OpenAIEmbeddings(),
    knowledge_graph=kg,
)
new_testset = generator.generate(testset_size=30)

Cost Tracking

# Generate with token usage tracking
from ragas.cost import TokenUsageParser

def my_parser(llm_result):
    # Custom parser for your LLM's token usage format
    return TokenUsage(
        input_tokens=llm_result.llm_output["usage"]["prompt_tokens"],
        output_tokens=llm_result.llm_output["usage"]["completion_tokens"],
    )

testset = generator.generate(
    testset_size=20,
    token_usage_parser=my_parser,
)

# Check cost
tokens = testset.total_tokens()
cost = testset.total_cost(
    cost_per_input_token=0.00001,
    cost_per_output_token=0.00003,
)
print(f"Total cost: ${cost:.4f}")

Related Pages

Principle:Explodinggradients_Ragas_Testset_Export
Implementation:Explodinggradients_Ragas_TestsetGenerator_Generate -- the generator that produces the Testset
Implementation:Explodinggradients_Ragas_KnowledgeGraph_Class -- KnowledgeGraph.save() for graph persistence

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment