Principle:Sdv dev SDV Demo Data Loading
| Knowledge Sources | |
|---|---|
| Domains | Data_Science, Synthetic_Data |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
A data loading mechanism that provides ready-to-use demo datasets for prototyping synthetic data generation pipelines.
Description
Demo data loading enables users to quickly acquire pre-curated datasets from a remote repository (S3 bucket) along with their accompanying metadata definitions. This eliminates the need for manual data preparation during initial experimentation with synthetic data tools. The function supports three data modalities: single-table (flat DataFrames), multi-table (relational dictionaries of DataFrames), and sequential (time-series DataFrames with sequence keys).
The returned data and metadata are immediately compatible with all SDV synthesizer classes, forming the standard entry point for any SDV workflow.
Usage
Use this principle when beginning any SDV workflow and you need sample data to experiment with. It is the recommended starting point for tutorials, prototyping, and testing before working with proprietary datasets. Choose the appropriate modality parameter to match your target synthesizer type.
Theoretical Basis
Demo data loading follows the factory pattern for data provisioning:
- User specifies a modality and dataset name
- The system fetches compressed data and metadata from a remote store
- Data is deserialized into pandas DataFrames
- Metadata is parsed into a structured schema object
- Both are returned as a tuple for immediate pipeline use
This pattern decouples data acquisition from data processing, allowing synthesizer workflows to begin from a consistent, validated starting point.