Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas Text2SQL Validate Dataset

From Leeroopedia
Knowledge Sources
Domains Text-to-SQL, Dataset Validation, Quality Assurance
Last Updated 2026-02-12 00:00 GMT

Overview

This script validates a text-to-SQL evaluation dataset by executing each SQL query against the database, classifying result types, and generating detailed validation reports with summary statistics.

Description

The validate_sql_dataset.py module provides a complete validation pipeline for text-to-SQL datasets. It loads a CSV dataset, executes each SQL query against the BookSQL SQLite database, and produces both detailed per-query results and aggregate summary statistics.

Dataset Loading -- The load_dataset function reads a CSV file (defaulting to datasets/booksql_sample.csv) and parses each row into a dictionary with index, query (natural language), sql (the SQL statement), level (difficulty), and split (dataset split).

Query Execution and Validation -- The execute_and_validate_query function is the core validation routine. For each query, it:

  • Calls execute_sql from db_utils with case-insensitive string matching enabled.
  • Measures execution time.
  • Classifies the result into one of four types: has_data (non-empty, non-null results), null_values (all values in the first row are null/None), empty (zero rows returned), or failed (execution error).
  • Captures result metadata including shape, column names, and up to 10 sample data rows (with a truncation flag for larger results).

Summary Statistics -- The generate_summary_statistics function computes:

  • Overall success rate and counts.
  • Result type distribution (has_data, null_values, empty, failed).
  • Per-difficulty-level breakdown with success rates and result type counts.
  • Common error types extracted from error messages.
  • Average execution time across all queries.
  • Sample indices of successful and failed queries.

Output -- The main function runs the full validation pipeline and writes two JSON files:

  • validation_results.json -- Detailed per-query results including execution status, timing, result data, and classification.
  • validation_summary.json -- Aggregate statistics and distributions.

This module is also imported by data_utils.py, which uses execute_and_validate_query to validate samples during dataset creation.

Usage

Import this module when you need to validate that SQL queries in your text-to-SQL evaluation dataset execute correctly against the target database. Use execute_and_validate_query for individual query validation in custom pipelines, or run the script directly for full dataset validation with JSON reports.

Code Reference

Source Location

Signature

def load_dataset(csv_path: str = "datasets/booksql_sample.csv") -> List[Dict[str, Any]]

def execute_and_validate_query(query_data: Dict[str, Any]) -> Dict[str, Any]

def generate_summary_statistics(results: List[Dict[str, Any]]) -> Dict[str, Any]

def main() -> None

Import

from ragas_examples.text2sql.validate_sql_dataset import (
    load_dataset,
    execute_and_validate_query,
    generate_summary_statistics,
)

I/O Contract

Inputs

load_dataset

Name Type Required Description
csv_path str No Path to the CSV file containing queries (default: "datasets/booksql_sample.csv")

execute_and_validate_query

Name Type Required Description
query_data Dict[str, Any] Yes Dictionary with keys: "index" (int), "query" (str, natural language), "sql" (str, SQL statement), "level" (str, difficulty), "split" (str, dataset split)

generate_summary_statistics

Name Type Required Description
results List[Dict[str, Any]] Yes List of validation result dictionaries from execute_and_validate_query

Outputs

load_dataset

Name Type Description
return List[Dict[str, Any]] List of dictionaries with keys: index, query, sql, level, split

execute_and_validate_query

Name Type Description
return Dict[str, Any] Dictionary containing: index, natural_language_query, sql_query, difficulty_level, dataset_split, execution_success (bool), execution_time (float), error_message (str or None), result_data (list of dicts or None), result_shape ([rows, cols] or None), result_columns (list or None), result_type ("has_data", "null_values", "empty", or "failed"), result_truncated (bool), total_rows (int)

generate_summary_statistics

Name Type Description
return Dict[str, Any] Dictionary containing: validation_timestamp, total_queries, successful_queries, failed_queries, overall_success_rate, average_execution_time_seconds, result_type_counts, statistics_by_difficulty, common_error_types, sample_successful_queries, sample_failed_queries

Usage Examples

Running the Full Validation Pipeline

# Run the validation script from the text2sql directory
python validate_sql_dataset.py
# Outputs: validation_results.json and validation_summary.json

Validating a Single Query Programmatically

from validate_sql_dataset import execute_and_validate_query

query_data = {
    "index": 0,
    "query": "How many transactions are there?",
    "sql": "SELECT COUNT(*) FROM master_txn_table",
    "level": "easy",
    "split": "train",
}

result = execute_and_validate_query(query_data)

if result["execution_success"]:
    print(f"Result type: {result['result_type']}")
    print(f"Shape: {result['result_shape']}")
    print(f"Execution time: {result['execution_time']:.3f}s")
    if result["result_data"]:
        print(f"First row: {result['result_data'][0]}")
else:
    print(f"Query failed: {result['error_message']}")

Generating Summary Statistics

from validate_sql_dataset import load_dataset, execute_and_validate_query, generate_summary_statistics

dataset = load_dataset("datasets/booksql_sample.csv")
results = [execute_and_validate_query(q) for q in dataset]
summary = generate_summary_statistics(results)

print(f"Success rate: {summary['overall_success_rate']:.1%}")
print(f"Has data: {summary['result_type_counts']['has_data']}")
print(f"Failed: {summary['result_type_counts']['failed']}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment