Principle:Sdv dev SDV HMA Synthesis
| Knowledge Sources | |
|---|---|
| Domains | Synthetic_Data, Relational_Data, Hierarchical_Modeling |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
A hierarchical modeling algorithm that synthesizes multi-table relational data by augmenting parent tables with statistical summaries of their child tables.
Description
HMA (Hierarchical Modeling Algorithm) handles the core challenge of multi-table synthesis: preserving referential integrity and inter-table statistical relationships. It works by augmenting parent tables with extension columns that summarize the distributions of child table columns (means, standard deviations, and counts). A single-table synthesizer (GaussianCopulaSynthesizer) is then fitted to each augmented table. During sampling, parent rows are generated first, their extension columns are used to parameterize child row generation, and referential integrity is maintained through hierarchical parent-first sampling.
Usage
Use HMA synthesis for any multi-table relational dataset where preserving inter-table relationships is important. It is the default and primary multi-table synthesizer in SDV.
Theoretical Basis
HMA operates in two phases:
Fitting Phase:
- For each parent-child relationship, compute extension columns on the parent:
- Count of child rows per parent
- Mean and standard deviation of each numerical child column
- Frequency distributions of categorical child columns
- Fit a GaussianCopulaSynthesizer on each augmented parent table
Sampling Phase:
- Sample root table rows (including extension columns)
- For each parent row, use extension column values to parameterize child generation
- Recursively sample children, then grandchildren
- Drop extension columns from final output