Principle:Explodinggradients Ragas Testset Export
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| explodinggradients/ragas | LLM Evaluation, Test Data Generation, Data Export | 2026-02-10 |
Overview
Description
Testset Export is the principle of converting generated test datasets into various output formats to bridge the gap between test data generation and evaluation consumption. A generated Testset is the raw output of the synthesis pipeline, but downstream evaluation tools, analytics workflows, and human reviewers each require data in different formats. Ragas addresses this by providing multiple export paths from a single Testset object.
Usage
After generating a test set using TestsetGenerator.generate() or any of the generate_with_* methods, users call one of the export methods on the resulting Testset object:
to_evaluation_dataset()-- Converts to anEvaluationDataset, the native Ragas format for running evaluation metrics. This strips away testset-specific metadata (like synthesizer name) and retains only the evaluation-relevant fields.to_pandas()-- Converts to a pandasDataFramefor tabular inspection, filtering, and analysis. Inherited from the baseRagasDatasetclass.to_list()-- Converts to a list of dictionaries, useful for JSON serialization, API payloads, or custom processing pipelines.to_hf_dataset()-- Converts to a Hugging FaceDatasetfor integration with the HF ecosystem. Inherited from the baseRagasDatasetclass.
Additionally, the knowledge graph itself can be persisted independently via KnowledgeGraph.save(), enabling reuse across multiple test generation runs.
Theoretical Basis
Separation of Generation and Evaluation Concerns: The Testset contains generation-specific metadata such as synthesizer_name that is useful for understanding how the test set was created but irrelevant to evaluation. The to_evaluation_dataset() method projects away this metadata, yielding a clean EvaluationDataset with only eval_sample objects (either SingleTurnSample or MultiTurnSample). This separation ensures that evaluation metrics operate on a consistent, minimal data contract regardless of how the test data was generated.
Multi-Format Interoperability: Different parts of a typical ML workflow require different data formats:
- Evaluation pipelines consume
EvaluationDatasetobjects directly. - Data scientists inspect and filter data in pandas DataFrames.
- APIs and storage systems consume JSON-serializable lists of dictionaries.
- Hugging Face Hub integration requires
datasets.Datasetobjects.
By providing native conversion to all these formats, Ragas eliminates manual data wrangling and reduces the risk of format conversion errors.
Bidirectional Serialization: The Testset supports both export (to_list()) and import (from_list()) as well as annotated file loading (from_annotated()). This enables round-trip workflows where test sets are generated, exported, annotated by humans (approving or rejecting samples), and re-imported for evaluation.
Cost Tracking Through Export: The Testset optionally carries a CostCallbackHandler that tracks token usage during generation. The total_tokens() and total_cost() methods provide post-generation cost analysis, enabling users to understand and optimize the cost of their test generation workflows.
Related Pages
- Implementation:Explodinggradients_Ragas_Testset_Export_Methods
- Principle:Explodinggradients_Ragas_Test_Query_Synthesis -- the synthesis pipeline that produces the Testset
- Principle:Explodinggradients_Ragas_Knowledge_Graph_Construction -- the KnowledgeGraph.save() method for graph persistence