Implementation:Vibrantlabsai Ragas Text2SQL Validate Dataset
| Knowledge Sources | |
|---|---|
| Domains | Text-to-SQL, Dataset Validation, Quality Assurance |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
This script validates a text-to-SQL evaluation dataset by executing each SQL query against the database, classifying result types, and generating detailed validation reports with summary statistics.
Description
The validate_sql_dataset.py module provides a complete validation pipeline for text-to-SQL datasets. It loads a CSV dataset, executes each SQL query against the BookSQL SQLite database, and produces both detailed per-query results and aggregate summary statistics.
Dataset Loading -- The load_dataset function reads a CSV file (defaulting to datasets/booksql_sample.csv) and parses each row into a dictionary with index, query (natural language), sql (the SQL statement), level (difficulty), and split (dataset split).
Query Execution and Validation -- The execute_and_validate_query function is the core validation routine. For each query, it:
- Calls execute_sql from db_utils with case-insensitive string matching enabled.
- Measures execution time.
- Classifies the result into one of four types: has_data (non-empty, non-null results), null_values (all values in the first row are null/None), empty (zero rows returned), or failed (execution error).
- Captures result metadata including shape, column names, and up to 10 sample data rows (with a truncation flag for larger results).
Summary Statistics -- The generate_summary_statistics function computes:
- Overall success rate and counts.
- Result type distribution (has_data, null_values, empty, failed).
- Per-difficulty-level breakdown with success rates and result type counts.
- Common error types extracted from error messages.
- Average execution time across all queries.
- Sample indices of successful and failed queries.
Output -- The main function runs the full validation pipeline and writes two JSON files:
- validation_results.json -- Detailed per-query results including execution status, timing, result data, and classification.
- validation_summary.json -- Aggregate statistics and distributions.
This module is also imported by data_utils.py, which uses execute_and_validate_query to validate samples during dataset creation.
Usage
Import this module when you need to validate that SQL queries in your text-to-SQL evaluation dataset execute correctly against the target database. Use execute_and_validate_query for individual query validation in custom pipelines, or run the script directly for full dataset validation with JSON reports.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: examples/ragas_examples/text2sql/validate_sql_dataset.py
Signature
def load_dataset(csv_path: str = "datasets/booksql_sample.csv") -> List[Dict[str, Any]]
def execute_and_validate_query(query_data: Dict[str, Any]) -> Dict[str, Any]
def generate_summary_statistics(results: List[Dict[str, Any]]) -> Dict[str, Any]
def main() -> None
Import
from ragas_examples.text2sql.validate_sql_dataset import (
load_dataset,
execute_and_validate_query,
generate_summary_statistics,
)
I/O Contract
Inputs
load_dataset
| Name | Type | Required | Description |
|---|---|---|---|
| csv_path | str | No | Path to the CSV file containing queries (default: "datasets/booksql_sample.csv") |
execute_and_validate_query
| Name | Type | Required | Description |
|---|---|---|---|
| query_data | Dict[str, Any] | Yes | Dictionary with keys: "index" (int), "query" (str, natural language), "sql" (str, SQL statement), "level" (str, difficulty), "split" (str, dataset split) |
generate_summary_statistics
| Name | Type | Required | Description |
|---|---|---|---|
| results | List[Dict[str, Any]] | Yes | List of validation result dictionaries from execute_and_validate_query |
Outputs
load_dataset
| Name | Type | Description |
|---|---|---|
| return | List[Dict[str, Any]] | List of dictionaries with keys: index, query, sql, level, split |
execute_and_validate_query
| Name | Type | Description |
|---|---|---|
| return | Dict[str, Any] | Dictionary containing: index, natural_language_query, sql_query, difficulty_level, dataset_split, execution_success (bool), execution_time (float), error_message (str or None), result_data (list of dicts or None), result_shape ([rows, cols] or None), result_columns (list or None), result_type ("has_data", "null_values", "empty", or "failed"), result_truncated (bool), total_rows (int) |
generate_summary_statistics
| Name | Type | Description |
|---|---|---|
| return | Dict[str, Any] | Dictionary containing: validation_timestamp, total_queries, successful_queries, failed_queries, overall_success_rate, average_execution_time_seconds, result_type_counts, statistics_by_difficulty, common_error_types, sample_successful_queries, sample_failed_queries |
Usage Examples
Running the Full Validation Pipeline
# Run the validation script from the text2sql directory
python validate_sql_dataset.py
# Outputs: validation_results.json and validation_summary.json
Validating a Single Query Programmatically
from validate_sql_dataset import execute_and_validate_query
query_data = {
"index": 0,
"query": "How many transactions are there?",
"sql": "SELECT COUNT(*) FROM master_txn_table",
"level": "easy",
"split": "train",
}
result = execute_and_validate_query(query_data)
if result["execution_success"]:
print(f"Result type: {result['result_type']}")
print(f"Shape: {result['result_shape']}")
print(f"Execution time: {result['execution_time']:.3f}s")
if result["result_data"]:
print(f"First row: {result['result_data'][0]}")
else:
print(f"Query failed: {result['error_message']}")
Generating Summary Statistics
from validate_sql_dataset import load_dataset, execute_and_validate_query, generate_summary_statistics
dataset = load_dataset("datasets/booksql_sample.csv")
results = [execute_and_validate_query(q) for q in dataset]
summary = generate_summary_statistics(results)
print(f"Success rate: {summary['overall_success_rate']:.1%}")
print(f"Has data: {summary['result_type_counts']['has_data']}")
print(f"Failed: {summary['result_type_counts']['failed']}")