Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Sdv dev SDV Gaussian KDE Incompatibility

From Leeroopedia
Knowledge Sources
Domains Debugging, Synthetic_Data, Multi_Table
Last Updated 2026-02-14 19:00 GMT

Overview

The `gaussian_kde` distribution is non-parametric and incompatible with HMASynthesizer; use `beta`, `truncnorm`, or other parametric distributions instead.

Description

The GaussianCopulaSynthesizer supports `gaussian_kde` (Gaussian Kernel Density Estimation) as a univariate distribution option. However, because `gaussian_kde` is non-parametric, it cannot produce the statistical parameters that HMASynthesizer needs to propagate child table distributions into parent table extended columns. Additionally, using `gaussian_kde` makes the `get_parameters()` method unusable even in single-table contexts.

Usage

Apply this heuristic when configuring distribution choices for GaussianCopulaSynthesizer. Avoid `gaussian_kde` if you plan to use `get_parameters()` or if the synthesizer will be used as a child synthesizer within HMASynthesizer. The error is raised immediately when `set_table_parameters()` is called with `gaussian_kde` on an HMA model.

The Insight (Rule of Thumb)

  • Action: Do not use `gaussian_kde` as a distribution in HMASynthesizer. Use `beta` (default), `truncnorm`, `norm`, `gamma`, or `uniform` instead.
  • Value: Default distribution is `beta` for standalone GaussianCopula; HMA child tables also default to `beta`.
  • Trade-off: Parametric distributions may not fit all data shapes as flexibly as KDE, but they are required for multi-table hierarchical modeling and parameter extraction.

Reasoning

HMASynthesizer works by fitting GaussianCopula models to child tables, then extracting their learned distribution parameters and appending them as extended columns to the parent table. This requires each distribution to produce a fixed-size parameter vector. Since `gaussian_kde` stores the entire kernel density (non-parametric), it cannot produce a fixed parameter vector, breaking the HMA pipeline.

Code Evidence

Incompatibility check from `sdv/multi_table/hma.py:218-227`:

has_gaussian_kde = any(
    dist == 'gaussian_kde'
    for dist in table_parameters.get('numerical_distributions', {}).values()
)
if table_parameters.get('default_distribution') == 'gaussian_kde' or has_gaussian_kde:
    raise SynthesizerInputError(
        "The 'gaussian_kde' is not compatible with the HMA algorithm. Please choose a "
        "different distribution such as 'beta' or 'truncnorm'. Or try a different "
        'algorithm such as HSA.'
    )

Documentation warning from `sdv/single_table/copulas.py:54-56`:

* ``gaussian_kde``: Use a GaussianKDE distribution. This model is non-parametric,
  so using this will make ``get_parameters`` unusable.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment