Principle:Sdv dev SDV CTGAN Synthesis
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Synthetic_Data, Generative_Adversarial_Networks |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
A deep learning-based technique that uses conditional generative adversarial networks to synthesize realistic tabular data with mixed column types.
Description
CTGAN (Conditional Tabular GAN) addresses the challenges of applying GANs to tabular data, where columns have mixed types (continuous and categorical), imbalanced categories, and multi-modal continuous distributions. It introduces mode-specific normalization for continuous columns and a conditional generator with training-by-sampling to handle imbalanced categorical columns.
Unlike copula-based approaches, CTGAN can capture complex non-linear relationships between columns. However, it requires more data and longer training times.
Usage
Use CTGAN synthesis when the dataset has complex non-linear inter-column relationships that a Gaussian copula cannot capture. It is particularly useful for larger datasets where the additional training cost is justified by improved fidelity.
Theoretical Basis
CTGAN consists of a generator and discriminator trained adversarially:
1. Mode-Specific Normalization: Continuous values are normalized using a variational Gaussian mixture model to handle multi-modal distributions.
2. Conditional Generator: During training, a categorical column and specific value are sampled as a condition. The generator must produce rows matching that condition.
3. Training-by-Sampling: Training batches are sampled to ensure all categories are represented evenly, addressing class imbalance.
4. PacGAN Discriminator: Multiple samples are packed together as input to the discriminator to prevent mode collapse.