Principle:Sdv dev SDV Gaussian Copula Synthesis
| Knowledge Sources | |
|---|---|
| Domains | Statistics, Synthetic_Data, Probabilistic_Modeling |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
A statistical modeling technique that captures multivariate dependencies between columns using a Gaussian copula to generate synthetic tabular data.
Description
Gaussian copula synthesis separates the modeling of individual column distributions (marginals) from the modeling of inter-column dependencies (the copula). Each column is first transformed to follow a standard normal distribution using its fitted univariate distribution (e.g., beta, truncated normal, gamma). The correlations between these transformed columns are then captured by a multivariate Gaussian distribution. During sampling, correlated normal samples are drawn and then inverse-transformed through each column's marginal distribution to produce realistic synthetic data.
This approach is computationally efficient, interpretable, and works well for datasets with moderate complexity. It is the default and recommended synthesizer in SDV for most single-table use cases.
Usage
Use Gaussian copula synthesis when generating single-table synthetic data where statistical fidelity of column distributions and correlations is important. It is preferred over deep learning approaches (CTGAN) when the dataset is small to medium-sized, training speed matters, or when interpretable learned distributions are needed (e.g., via get_parameters/get_learned_distributions).
Theoretical Basis
The Gaussian copula model works in three stages:
1. Marginal Fitting: Each column is fitted with a univariate distribution (e.g., Beta, Gaussian KDE, Truncated Normal).
2. Copula Estimation: Transform each column to uniform via the probability integral transform:
Then transform to standard normal:
Estimate the correlation matrix of the vectors.
3. Sampling: Draw samples from , then invert through and :