Implementation:Sdv dev SDV Download Demo
| Knowledge Sources | |
|---|---|
| Domains | Data_Science, Synthetic_Data |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Concrete tool for downloading demo datasets from the SDV public S3 bucket, provided by the SDV library.
Description
The download_demo function fetches pre-curated datasets from the SDV public S3 bucket. It downloads both the raw data files and the metadata definition, returning them as a ready-to-use tuple. The function supports three modalities: single_table (returns a single DataFrame), multi_table (returns a dictionary of DataFrames), and sequential (returns a DataFrame with sequence key columns).
Usage
Import this function when you need sample data for testing or prototyping an SDV synthesis pipeline. It is the standard entry point for all SDV demo workflows and tutorials.
Code Reference
Source Location
- Repository: SDV
- File: sdv/datasets/demo.py
- Lines: L428-478
Signature
def download_demo(
modality,
dataset_name,
output_folder_name=None,
s3_bucket_name='sdv-datasets-public',
credentials=None
):
"""Download a demo dataset.
Args:
modality (str):
The modality of the dataset: 'single_table', 'multi_table', 'sequential'.
dataset_name (str):
Name of the dataset to be downloaded from the S3 bucket.
output_folder_name (str or None):
The name of the local folder where the metadata and data should be stored.
If None the data is not saved locally and is loaded as a Python object.
Defaults to None.
s3_bucket_name (str):
The name of the bucket to download from.
credentials (dict):
Dictionary containing DataCebo license key and username.
Returns:
tuple (data, metadata):
If data is single table or sequential, it is a DataFrame.
If data is multi table, it is a dictionary mapping table name to DataFrame.
metadata is of class Metadata.
"""
Import
from sdv.datasets.demo import download_demo
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| modality | str | Yes | One of 'single_table', 'multi_table', 'sequential' |
| dataset_name | str | Yes | Name of dataset in the S3 bucket |
| output_folder_name | str or None | No | Local folder to save data; None = in-memory only |
| s3_bucket_name | str | No | S3 bucket name (default: 'sdv-datasets-public') |
| credentials | dict or None | No | License credentials for enterprise buckets |
Outputs
| Name | Type | Description |
|---|---|---|
| data | pd.DataFrame or dict[str, pd.DataFrame] | Single DataFrame for single_table/sequential; dict for multi_table |
| metadata | Metadata | Metadata object describing the dataset schema |
Usage Examples
Single Table Demo
from sdv.datasets.demo import download_demo
# Download a single-table demo dataset
data, metadata = download_demo(
modality='single_table',
dataset_name='fake_hotel_guests'
)
print(data.head())
print(metadata)
Multi Table Demo
from sdv.datasets.demo import download_demo
# Download a multi-table demo dataset
data, metadata = download_demo(
modality='multi_table',
dataset_name='fake_hotels'
)
# data is a dict of DataFrames
for table_name, df in data.items():
print(f"{table_name}: {len(df)} rows")
Sequential Demo
from sdv.datasets.demo import download_demo
# Download a sequential demo dataset
data, metadata = download_demo(
modality='sequential',
dataset_name='nasdaq100_2019'
)
print(data.head())