Principle:Sdv dev SDV Multi Table Data Sampling
| Knowledge Sources | |
|---|---|
| Domains | Synthetic_Data, Relational_Data, Data_Generation |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
A hierarchical sampling process that generates synthetic relational data while preserving referential integrity and inter-table statistical relationships.
Description
Multi-table data sampling generates synthetic data for all tables in a relational dataset simultaneously. Using the hierarchical sampler, parent tables are sampled first, then child tables are conditioned on their parent rows. The scale parameter controls the output size relative to the original data. Referential integrity (foreign key relationships) is maintained by construction.
Usage
Call sample on a fitted HMASynthesizer. The scale parameter controls the output size: 1.0 produces data with similar row counts to the original, values > 1.0 produce more rows, and values < 1.0 produce fewer rows.
Theoretical Basis
- Root sampling: Sample rows for root tables (tables with no parents)
- Hierarchical descent: For each parent row, determine child row count from extension columns
- Child generation: Sample child rows conditioned on parent row extension values
- Foreign key assignment: Assign the parent's primary key as the child's foreign key
- Cleanup: Remove extension columns from all tables