Principle:Sdv dev SDV Gaussian Copula Synthesis

Knowledge Sources	Gaussian Copula SDV Documentation SDV
Domains	Statistics, Synthetic_Data, Probabilistic_Modeling
Last Updated	2026-02-14 00:00 GMT

Overview

A statistical modeling technique that captures multivariate dependencies between columns using a Gaussian copula to generate synthetic tabular data.

Description

Gaussian copula synthesis separates the modeling of individual column distributions (marginals) from the modeling of inter-column dependencies (the copula). Each column is first transformed to follow a standard normal distribution using its fitted univariate distribution (e.g., beta, truncated normal, gamma). The correlations between these transformed columns are then captured by a multivariate Gaussian distribution. During sampling, correlated normal samples are drawn and then inverse-transformed through each column's marginal distribution to produce realistic synthetic data.

This approach is computationally efficient, interpretable, and works well for datasets with moderate complexity. It is the default and recommended synthesizer in SDV for most single-table use cases.

Usage

Use Gaussian copula synthesis when generating single-table synthetic data where statistical fidelity of column distributions and correlations is important. It is preferred over deep learning approaches (CTGAN) when the dataset is small to medium-sized, training speed matters, or when interpretable learned distributions are needed (e.g., via get_parameters/get_learned_distributions).

Theoretical Basis

The Gaussian copula model works in three stages:

1. Marginal Fitting: Each column $X_{i}$ is fitted with a univariate distribution $F_{i}$ (e.g., Beta, Gaussian KDE, Truncated Normal).

2. Copula Estimation: Transform each column to uniform via the probability integral transform: $U_{i} = F_{i} (X_{i})$

Then transform to standard normal: $Z_{i} = Φ^{- 1} (U_{i})$

Estimate the correlation matrix $Σ$ of the $Z_{i}$ vectors.

3. Sampling: Draw samples from $𝒩 (0, Σ)$ , then invert through $Φ$ and $F_{i}^{- 1}$ : $X_{i}^{s y n t h} = F_{i}^{- 1} (Φ (Z_{i}^{s y n t h}))$

Related Pages

Implemented By

Implementation:Sdv_dev_SDV_GaussianCopulaSynthesizer_Init

Uses Heuristic

Heuristic:Sdv_dev_SDV_Gaussian_KDE_Incompatibility

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment