Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Explodinggradients Ragas Testset Export Methods

From Leeroopedia


Knowledge Sources Domains Last Updated
explodinggradients/ragas LLM Evaluation, Test Data Generation, Data Export 2026-02-10

Overview

Description

The Testset Export Methods are the collection of methods on the Testset class (and its base class RagasDataset) that convert generated test data into different output formats. The primary methods are to_evaluation_dataset(), to_list(), and to_pandas(). Additionally, the KnowledgeGraph.save() method enables persisting the underlying graph for reuse.

Usage

These methods are called after test generation completes. The Testset object returned by TestsetGenerator.generate() exposes all export methods. The choice of method depends on the downstream consumer.

Code Reference

Source Location

Component File Lines
Testset (dataclass) src/ragas/testset/synthesizers/testset_schema.py L46-152
Testset.to_evaluation_dataset src/ragas/testset/synthesizers/testset_schema.py L61-67
Testset.to_list src/ragas/testset/synthesizers/testset_schema.py L69-78
Testset.from_list src/ragas/testset/synthesizers/testset_schema.py L80-108
RagasDataset.to_pandas src/ragas/dataset_schema.py L238-248
RagasDataset.to_hf_dataset src/ragas/dataset_schema.py L222-231
KnowledgeGraph.save src/ragas/testset/graph.py L183-204

Signature

@dataclass
class Testset(RagasDataset[TestsetSample]):
    samples: List[TestsetSample]
    run_id: str = field(default_factory=lambda: str(uuid4()))
    cost_cb: Optional[CostCallbackHandler] = field(default=None)

    def to_evaluation_dataset(self) -> EvaluationDataset:
        ...

    def to_list(self) -> List[Dict]:
        ...

    def to_pandas(self) -> DataFrame:  # inherited from RagasDataset
        ...

    def to_hf_dataset(self) -> HFDataset:  # inherited from RagasDataset
        ...

    @classmethod
    def from_list(cls, data: List[Dict]) -> Testset:
        ...

    @classmethod
    def from_annotated(cls, path: str) -> Testset:
        ...

    def total_tokens(self) -> Union[List[TokenUsage], TokenUsage]:
        ...

    def total_cost(
        self,
        cost_per_input_token: Optional[float] = None,
        cost_per_output_token: Optional[float] = None,
    ) -> float:
        ...

Import

from ragas.testset import Testset
# or
from ragas.testset.synthesizers.testset_schema import Testset, TestsetSample

I/O Contract

to_evaluation_dataset()

Direction Type Description
Input (none) Operates on the Testset instance's samples list
Output EvaluationDataset Contains only the eval_sample field from each TestsetSample; strips synthesizer_name

to_list()

Direction Type Description
Input (none) Operates on the Testset instance's samples list
Output List[Dict] Each dictionary contains all non-None fields from eval_sample plus synthesizer_name

to_pandas()

Direction Type Description
Input (none) Calls to_list() internally
Output pandas.DataFrame Tabular representation with one row per sample; requires pandas to be installed

from_list()

Direction Type Description
Input List[Dict] Each dict must contain synthesizer_name and fields for either SingleTurnSample or MultiTurnSample
Output Testset Reconstructed testset with proper sample typing based on user_input format

KnowledgeGraph.save()

Direction Type Description
Input Union[str, Path] File system path for the output JSON file
Output JSON file UTF-8 encoded JSON with nodes and relationships arrays

Usage Examples

Converting to EvaluationDataset

from ragas.testset import TestsetGenerator
from ragas import evaluate
from ragas.metrics import faithfulness, answer_correctness

# Assume testset has been generated
# testset = generator.generate_with_langchain_docs(documents, testset_size=20)

eval_dataset = testset.to_evaluation_dataset()
print(f"Evaluation dataset with {len(eval_dataset.samples)} samples")

# Use directly with Ragas evaluate
result = evaluate(
    dataset=eval_dataset,
    metrics=[faithfulness, answer_correctness],
)

Converting to Pandas DataFrame

df = testset.to_pandas()
print(df.columns.tolist())
# ['user_input', 'retrieved_contexts', 'response', 'reference', 'synthesizer_name', ...]

# Filter by synthesizer type
single_hop = df[df["synthesizer_name"] == "single_hop_specific_query_synthesizer"]
print(f"Single-hop samples: {len(single_hop)}")

# Inspect a sample
print(df.iloc[0]["user_input"])

Round-Trip Serialization

from ragas.testset.synthesizers.testset_schema import Testset

# Export to list of dicts
data = testset.to_list()

# Save as JSON
import json
with open("testset.json", "w") as f:
    json.dump(data, f, indent=2)

# Reload
with open("testset.json", "r") as f:
    loaded_data = json.load(f)

testset_restored = Testset.from_list(loaded_data)
print(f"Restored {len(testset_restored.samples)} samples")

Converting to Hugging Face Dataset

hf_dataset = testset.to_hf_dataset()
print(hf_dataset)
# Dataset({
#     features: ['user_input', 'response', 'reference', 'synthesizer_name', ...],
#     num_rows: 20
# })

# Push to Hub
hf_dataset.push_to_hub("my-org/my-testset")

Saving the Knowledge Graph for Reuse

# After generation, save the enriched graph
generator.knowledge_graph.save("enriched_kg.json")

# Later, reload and generate a new test set without reprocessing documents
from ragas.testset.graph import KnowledgeGraph
from ragas.testset import TestsetGenerator
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

kg = KnowledgeGraph.load("enriched_kg.json")
generator = TestsetGenerator.from_langchain(
    llm=ChatOpenAI(model="gpt-4o"),
    embedding_model=OpenAIEmbeddings(),
    knowledge_graph=kg,
)
new_testset = generator.generate(testset_size=30)

Cost Tracking

# Generate with token usage tracking
from ragas.cost import TokenUsageParser

def my_parser(llm_result):
    # Custom parser for your LLM's token usage format
    return TokenUsage(
        input_tokens=llm_result.llm_output["usage"]["prompt_tokens"],
        output_tokens=llm_result.llm_output["usage"]["completion_tokens"],
    )

testset = generator.generate(
    testset_size=20,
    token_usage_parser=my_parser,
)

# Check cost
tokens = testset.total_tokens()
cost = testset.total_cost(
    cost_per_input_token=0.00001,
    cost_per_output_token=0.00003,
)
print(f"Total cost: ${cost:.4f}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment