Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Explodinggradients Ragas Testset Export

From Leeroopedia


Knowledge Sources Domains Last Updated
explodinggradients/ragas LLM Evaluation, Test Data Generation, Data Export 2026-02-10

Overview

Description

Testset Export is the principle of converting generated test datasets into various output formats to bridge the gap between test data generation and evaluation consumption. A generated Testset is the raw output of the synthesis pipeline, but downstream evaluation tools, analytics workflows, and human reviewers each require data in different formats. Ragas addresses this by providing multiple export paths from a single Testset object.

Usage

After generating a test set using TestsetGenerator.generate() or any of the generate_with_* methods, users call one of the export methods on the resulting Testset object:

  • to_evaluation_dataset() -- Converts to an EvaluationDataset, the native Ragas format for running evaluation metrics. This strips away testset-specific metadata (like synthesizer name) and retains only the evaluation-relevant fields.
  • to_pandas() -- Converts to a pandas DataFrame for tabular inspection, filtering, and analysis. Inherited from the base RagasDataset class.
  • to_list() -- Converts to a list of dictionaries, useful for JSON serialization, API payloads, or custom processing pipelines.
  • to_hf_dataset() -- Converts to a Hugging Face Dataset for integration with the HF ecosystem. Inherited from the base RagasDataset class.

Additionally, the knowledge graph itself can be persisted independently via KnowledgeGraph.save(), enabling reuse across multiple test generation runs.

Theoretical Basis

Separation of Generation and Evaluation Concerns: The Testset contains generation-specific metadata such as synthesizer_name that is useful for understanding how the test set was created but irrelevant to evaluation. The to_evaluation_dataset() method projects away this metadata, yielding a clean EvaluationDataset with only eval_sample objects (either SingleTurnSample or MultiTurnSample). This separation ensures that evaluation metrics operate on a consistent, minimal data contract regardless of how the test data was generated.

Multi-Format Interoperability: Different parts of a typical ML workflow require different data formats:

  • Evaluation pipelines consume EvaluationDataset objects directly.
  • Data scientists inspect and filter data in pandas DataFrames.
  • APIs and storage systems consume JSON-serializable lists of dictionaries.
  • Hugging Face Hub integration requires datasets.Dataset objects.

By providing native conversion to all these formats, Ragas eliminates manual data wrangling and reduces the risk of format conversion errors.

Bidirectional Serialization: The Testset supports both export (to_list()) and import (from_list()) as well as annotated file loading (from_annotated()). This enables round-trip workflows where test sets are generated, exported, annotated by humans (approving or rejecting samples), and re-imported for evaluation.

Cost Tracking Through Export: The Testset optionally carries a CostCallbackHandler that tracks token usage during generation. The total_tokens() and total_cost() methods provide post-generation cost analysis, enabling users to understand and optimize the cost of their test generation workflows.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment