Principle:Sdv dev SDV Multi Table Data Sampling

Knowledge Sources	SDV Documentation SDV
Domains	Synthetic_Data, Relational_Data, Data_Generation
Last Updated	2026-02-14 00:00 GMT

Overview

A hierarchical sampling process that generates synthetic relational data while preserving referential integrity and inter-table statistical relationships.

Description

Multi-table data sampling generates synthetic data for all tables in a relational dataset simultaneously. Using the hierarchical sampler, parent tables are sampled first, then child tables are conditioned on their parent rows. The scale parameter controls the output size relative to the original data. Referential integrity (foreign key relationships) is maintained by construction.

Usage

Call sample on a fitted HMASynthesizer. The scale parameter controls the output size: 1.0 produces data with similar row counts to the original, values > 1.0 produce more rows, and values < 1.0 produce fewer rows.

Theoretical Basis

Root sampling: Sample rows for root tables (tables with no parents)
Hierarchical descent: For each parent row, determine child row count from extension columns
Child generation: Sample child rows conditioned on parent row extension values
Foreign key assignment: Assign the parent's primary key as the child's foreign key
Cleanup: Remove extension columns from all tables

Related Pages

Implemented By

Implementation:Sdv_dev_SDV_BaseMultiTableSynthesizer_Sample

Uses Heuristic

Heuristic:Sdv_dev_SDV_Sampling_Retry_Tuning

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment