Principle:Sdv dev SDV HMA Synthesis

Knowledge Sources	SDV Documentation SDV HMA Algorithm
Domains	Synthetic_Data, Relational_Data, Hierarchical_Modeling
Last Updated	2026-02-14 00:00 GMT

Overview

A hierarchical modeling algorithm that synthesizes multi-table relational data by augmenting parent tables with statistical summaries of their child tables.

Description

HMA (Hierarchical Modeling Algorithm) handles the core challenge of multi-table synthesis: preserving referential integrity and inter-table statistical relationships. It works by augmenting parent tables with extension columns that summarize the distributions of child table columns (means, standard deviations, and counts). A single-table synthesizer (GaussianCopulaSynthesizer) is then fitted to each augmented table. During sampling, parent rows are generated first, their extension columns are used to parameterize child row generation, and referential integrity is maintained through hierarchical parent-first sampling.

Usage

Use HMA synthesis for any multi-table relational dataset where preserving inter-table relationships is important. It is the default and primary multi-table synthesizer in SDV.

Theoretical Basis

HMA operates in two phases:

Fitting Phase:

For each parent-child relationship, compute extension columns on the parent:
- Count of child rows per parent
- Mean and standard deviation of each numerical child column
- Frequency distributions of categorical child columns
Fit a GaussianCopulaSynthesizer on each augmented parent table

Sampling Phase:

Sample root table rows (including extension columns)
For each parent row, use extension column values to parameterize child generation
Recursively sample children, then grandchildren
Drop extension columns from final output

Related Pages

Implemented By

Implementation:Sdv_dev_SDV_HMASynthesizer_Init

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment