Principle:Sdv dev SDV Demo Data Loading

Knowledge Sources	SDV Documentation SDV
Domains	Data_Science, Synthetic_Data
Last Updated	2026-02-14 00:00 GMT

Overview

A data loading mechanism that provides ready-to-use demo datasets for prototyping synthetic data generation pipelines.

Description

Demo data loading enables users to quickly acquire pre-curated datasets from a remote repository (S3 bucket) along with their accompanying metadata definitions. This eliminates the need for manual data preparation during initial experimentation with synthetic data tools. The function supports three data modalities: single-table (flat DataFrames), multi-table (relational dictionaries of DataFrames), and sequential (time-series DataFrames with sequence keys).

The returned data and metadata are immediately compatible with all SDV synthesizer classes, forming the standard entry point for any SDV workflow.

Usage

Use this principle when beginning any SDV workflow and you need sample data to experiment with. It is the recommended starting point for tutorials, prototyping, and testing before working with proprietary datasets. Choose the appropriate modality parameter to match your target synthesizer type.

Theoretical Basis

Demo data loading follows the factory pattern for data provisioning:

User specifies a modality and dataset name
The system fetches compressed data and metadata from a remote store
Data is deserialized into pandas DataFrames
Metadata is parsed into a structured schema object
Both are returned as a tuple for immediate pipeline use

This pattern decouples data acquisition from data processing, allowing synthesizer workflows to begin from a consistent, validated starting point.

Related Pages

Implemented By

Implementation:Sdv_dev_SDV_Download_Demo

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment