Heuristic:Sdv dev SDV Gaussian KDE Incompatibility
| Knowledge Sources | |
|---|---|
| Domains | Debugging, Synthetic_Data, Multi_Table |
| Last Updated | 2026-02-14 19:00 GMT |
Overview
The `gaussian_kde` distribution is non-parametric and incompatible with HMASynthesizer; use `beta`, `truncnorm`, or other parametric distributions instead.
Description
The GaussianCopulaSynthesizer supports `gaussian_kde` (Gaussian Kernel Density Estimation) as a univariate distribution option. However, because `gaussian_kde` is non-parametric, it cannot produce the statistical parameters that HMASynthesizer needs to propagate child table distributions into parent table extended columns. Additionally, using `gaussian_kde` makes the `get_parameters()` method unusable even in single-table contexts.
Usage
Apply this heuristic when configuring distribution choices for GaussianCopulaSynthesizer. Avoid `gaussian_kde` if you plan to use `get_parameters()` or if the synthesizer will be used as a child synthesizer within HMASynthesizer. The error is raised immediately when `set_table_parameters()` is called with `gaussian_kde` on an HMA model.
The Insight (Rule of Thumb)
- Action: Do not use `gaussian_kde` as a distribution in HMASynthesizer. Use `beta` (default), `truncnorm`, `norm`, `gamma`, or `uniform` instead.
- Value: Default distribution is `beta` for standalone GaussianCopula; HMA child tables also default to `beta`.
- Trade-off: Parametric distributions may not fit all data shapes as flexibly as KDE, but they are required for multi-table hierarchical modeling and parameter extraction.
Reasoning
HMASynthesizer works by fitting GaussianCopula models to child tables, then extracting their learned distribution parameters and appending them as extended columns to the parent table. This requires each distribution to produce a fixed-size parameter vector. Since `gaussian_kde` stores the entire kernel density (non-parametric), it cannot produce a fixed parameter vector, breaking the HMA pipeline.
Code Evidence
Incompatibility check from `sdv/multi_table/hma.py:218-227`:
has_gaussian_kde = any(
dist == 'gaussian_kde'
for dist in table_parameters.get('numerical_distributions', {}).values()
)
if table_parameters.get('default_distribution') == 'gaussian_kde' or has_gaussian_kde:
raise SynthesizerInputError(
"The 'gaussian_kde' is not compatible with the HMA algorithm. Please choose a "
"different distribution such as 'beta' or 'truncnorm'. Or try a different "
'algorithm such as HSA.'
)
Documentation warning from `sdv/single_table/copulas.py:54-56`:
* ``gaussian_kde``: Use a GaussianKDE distribution. This model is non-parametric,
so using this will make ``get_parameters`` unusable.