Implementation:Explodinggradients Ragas Testset Export Methods
Appearance
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| explodinggradients/ragas | LLM Evaluation, Test Data Generation, Data Export | 2026-02-10 |
Overview
Description
The Testset Export Methods are the collection of methods on the Testset class (and its base class RagasDataset) that convert generated test data into different output formats. The primary methods are to_evaluation_dataset(), to_list(), and to_pandas(). Additionally, the KnowledgeGraph.save() method enables persisting the underlying graph for reuse.
Usage
These methods are called after test generation completes. The Testset object returned by TestsetGenerator.generate() exposes all export methods. The choice of method depends on the downstream consumer.
Code Reference
Source Location
| Component | File | Lines |
|---|---|---|
Testset (dataclass) |
src/ragas/testset/synthesizers/testset_schema.py |
L46-152 |
Testset.to_evaluation_dataset |
src/ragas/testset/synthesizers/testset_schema.py |
L61-67 |
Testset.to_list |
src/ragas/testset/synthesizers/testset_schema.py |
L69-78 |
Testset.from_list |
src/ragas/testset/synthesizers/testset_schema.py |
L80-108 |
RagasDataset.to_pandas |
src/ragas/dataset_schema.py |
L238-248 |
RagasDataset.to_hf_dataset |
src/ragas/dataset_schema.py |
L222-231 |
KnowledgeGraph.save |
src/ragas/testset/graph.py |
L183-204 |
Signature
@dataclass
class Testset(RagasDataset[TestsetSample]):
samples: List[TestsetSample]
run_id: str = field(default_factory=lambda: str(uuid4()))
cost_cb: Optional[CostCallbackHandler] = field(default=None)
def to_evaluation_dataset(self) -> EvaluationDataset:
...
def to_list(self) -> List[Dict]:
...
def to_pandas(self) -> DataFrame: # inherited from RagasDataset
...
def to_hf_dataset(self) -> HFDataset: # inherited from RagasDataset
...
@classmethod
def from_list(cls, data: List[Dict]) -> Testset:
...
@classmethod
def from_annotated(cls, path: str) -> Testset:
...
def total_tokens(self) -> Union[List[TokenUsage], TokenUsage]:
...
def total_cost(
self,
cost_per_input_token: Optional[float] = None,
cost_per_output_token: Optional[float] = None,
) -> float:
...
Import
from ragas.testset import Testset
# or
from ragas.testset.synthesizers.testset_schema import Testset, TestsetSample
I/O Contract
to_evaluation_dataset()
| Direction | Type | Description |
|---|---|---|
| Input | (none) | Operates on the Testset instance's samples list
|
| Output | EvaluationDataset |
Contains only the eval_sample field from each TestsetSample; strips synthesizer_name
|
to_list()
| Direction | Type | Description |
|---|---|---|
| Input | (none) | Operates on the Testset instance's samples list
|
| Output | List[Dict] |
Each dictionary contains all non-None fields from eval_sample plus synthesizer_name
|
to_pandas()
| Direction | Type | Description |
|---|---|---|
| Input | (none) | Calls to_list() internally
|
| Output | pandas.DataFrame |
Tabular representation with one row per sample; requires pandas to be installed
|
from_list()
| Direction | Type | Description |
|---|---|---|
| Input | List[Dict] |
Each dict must contain synthesizer_name and fields for either SingleTurnSample or MultiTurnSample
|
| Output | Testset |
Reconstructed testset with proper sample typing based on user_input format
|
KnowledgeGraph.save()
| Direction | Type | Description |
|---|---|---|
| Input | Union[str, Path] |
File system path for the output JSON file |
| Output | JSON file | UTF-8 encoded JSON with nodes and relationships arrays
|
Usage Examples
Converting to EvaluationDataset
from ragas.testset import TestsetGenerator
from ragas import evaluate
from ragas.metrics import faithfulness, answer_correctness
# Assume testset has been generated
# testset = generator.generate_with_langchain_docs(documents, testset_size=20)
eval_dataset = testset.to_evaluation_dataset()
print(f"Evaluation dataset with {len(eval_dataset.samples)} samples")
# Use directly with Ragas evaluate
result = evaluate(
dataset=eval_dataset,
metrics=[faithfulness, answer_correctness],
)
Converting to Pandas DataFrame
df = testset.to_pandas()
print(df.columns.tolist())
# ['user_input', 'retrieved_contexts', 'response', 'reference', 'synthesizer_name', ...]
# Filter by synthesizer type
single_hop = df[df["synthesizer_name"] == "single_hop_specific_query_synthesizer"]
print(f"Single-hop samples: {len(single_hop)}")
# Inspect a sample
print(df.iloc[0]["user_input"])
Round-Trip Serialization
from ragas.testset.synthesizers.testset_schema import Testset
# Export to list of dicts
data = testset.to_list()
# Save as JSON
import json
with open("testset.json", "w") as f:
json.dump(data, f, indent=2)
# Reload
with open("testset.json", "r") as f:
loaded_data = json.load(f)
testset_restored = Testset.from_list(loaded_data)
print(f"Restored {len(testset_restored.samples)} samples")
Converting to Hugging Face Dataset
hf_dataset = testset.to_hf_dataset()
print(hf_dataset)
# Dataset({
# features: ['user_input', 'response', 'reference', 'synthesizer_name', ...],
# num_rows: 20
# })
# Push to Hub
hf_dataset.push_to_hub("my-org/my-testset")
Saving the Knowledge Graph for Reuse
# After generation, save the enriched graph
generator.knowledge_graph.save("enriched_kg.json")
# Later, reload and generate a new test set without reprocessing documents
from ragas.testset.graph import KnowledgeGraph
from ragas.testset import TestsetGenerator
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
kg = KnowledgeGraph.load("enriched_kg.json")
generator = TestsetGenerator.from_langchain(
llm=ChatOpenAI(model="gpt-4o"),
embedding_model=OpenAIEmbeddings(),
knowledge_graph=kg,
)
new_testset = generator.generate(testset_size=30)
Cost Tracking
# Generate with token usage tracking
from ragas.cost import TokenUsageParser
def my_parser(llm_result):
# Custom parser for your LLM's token usage format
return TokenUsage(
input_tokens=llm_result.llm_output["usage"]["prompt_tokens"],
output_tokens=llm_result.llm_output["usage"]["completion_tokens"],
)
testset = generator.generate(
testset_size=20,
token_usage_parser=my_parser,
)
# Check cost
tokens = testset.total_tokens()
cost = testset.total_cost(
cost_per_input_token=0.00001,
cost_per_output_token=0.00003,
)
print(f"Total cost: ${cost:.4f}")
Related Pages
- Principle:Explodinggradients_Ragas_Testset_Export
- Implementation:Explodinggradients_Ragas_TestsetGenerator_Generate -- the generator that produces the Testset
- Implementation:Explodinggradients_Ragas_KnowledgeGraph_Class -- KnowledgeGraph.save() for graph persistence
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment