Principle:Gretelai Gretel synthetics GAN Adversarial Training

Knowledge Sources	gretel-synthetics CTGAN
Domains	Synthetic_Data, GAN, Tabular_Data
Last Updated	2026-02-14 19:00 GMT

Overview

GAN adversarial training is the iterative optimization process in which a generator network learns to produce realistic synthetic data by competing against a discriminator network that learns to distinguish real from generated samples.

Description

In a Generative Adversarial Network, two neural networks are trained simultaneously in a minimax game. The generator receives random noise (and optionally a conditional vector) and outputs synthetic data samples. The discriminator receives both real data samples and generated samples and outputs a scalar score indicating how "real" the input appears. Training alternates between updating the discriminator to better distinguish real from fake, and updating the generator to better fool the discriminator.

ACTGAN extends the standard GAN training with several important design choices:

WGAN-GP loss: Instead of the original GAN binary cross-entropy loss, ACTGAN uses the Wasserstein distance with gradient penalty. The discriminator loss is -(mean(y_real) - mean(y_fake)) plus a gradient penalty term. This provides more stable training gradients and avoids mode collapse.
Packing (PAC): Multiple samples (controlled by the pac parameter) are grouped together as a single input to the discriminator. This helps the discriminator detect mode collapse by seeing multiple samples simultaneously.
Residual generator: The generator uses residual layers where each layer's output is concatenated with its input, allowing information to flow directly through the network and enabling the generator to build increasingly complex representations.
Conditional training: A conditional vector is concatenated with the noise vector for the generator and with the data for the discriminator, allowing the model to learn column-specific distributions and enabling conditional generation at inference time.
Reconstruction loss: In addition to the adversarial loss, the generator incurs a reconstruction loss that penalizes mismatches between the generated data and the conditional vector. This focuses the generator on accurately reproducing the conditioned columns.

Usage

Adversarial training is automatically invoked by ACTGAN.fit(data) after data transformation is complete. Key hyperparameters that affect training behavior include:

epochs: Number of full passes over the training data
batch_size: Number of samples per training step
discriminator_steps: Number of discriminator updates per generator update (default 1, WGAN paper suggests 5)
generator_lr / discriminator_lr: Learning rates for the Adam optimizers
reconstruction_loss_coef: Weight of the reconstruction loss relative to the adversarial loss
conditional_vector_type: SINGLE_DISCRETE (one discrete column per step) or ANYWAY (any combination of columns)

Theoretical Basis

The training follows the WGAN-GP (Wasserstein GAN with Gradient Penalty) formulation:

Discriminator Update

For each discriminator step:
    1. Sample noise z ~ N(0, I) of shape [batch_size, embedding_dim]
    2. Sample conditional vectors (fake_cond, real_cond, column_mask)
    3. Generate fake data: fake = G(z, fake_cond)
    4. Apply activation functions to get fakeact
    5. Score fake: y_fake = D(fakeact, fake_cond)
    6. Score real: y_real = D(real_encoded, real_cond)
    7. Gradient penalty:
       alpha ~ U(0,1)
       interpolated = alpha * real + (1-alpha) * fake
       GP = lambda * (||grad(D(interpolated))||_2 - 1)^2
    8. Loss_D = -(mean(y_real) - mean(y_fake)) + GP
    9. Update D with Adam optimizer

Generator Update

1. Sample new noise z ~ N(0, I)
2. Sample new conditional vectors
3. Generate fake data: fake, fakeact = G(z, fake_cond)
4. Score fake: y_fake = D(fakeact, fake_cond)

For SINGLE_DISCRETE conditional vectors:
    Loss_R = CrossEntropy(fake[:, discrete_cols], argmax(fake_cond[:, discrete_cols]))
For ANYWAY conditional vectors:
    Loss_R = sum of per-column losses (MSE for continuous, BCE for binary, CE for one-hot)
             weighted by column_mask

5. Loss_G = -mean(y_fake) + reconstruction_loss_coef * Loss_R
6. Update G with Adam optimizer

Network Architectures

Generator (with Residual layers):

Input: [noise, cond_vec] of dim (embedding_dim + cond_vec_dim)
For each d in generator_dim:
    h = concat(ReLU(BatchNorm(Linear(input, d))), input)
    input = h   # dimension grows by d at each layer
Output: Linear(h, data_dim)

Discriminator (with PAC packing):

Input: [data, cond_vec] reshaped to (batch_size/pac, pac * input_dim)
For each d in discriminator_dim:
    h = Dropout(0.5, LeakyReLU(0.2, Linear(input, d)))
    input = h
Output: Linear(h, 1)

Activation Functions

Different column types use different activation functions on the generator output:

Column Type	Activation	Loss Function
Continuous (normalized value)	`tanh`	MSE
Continuous (component vector)	`gumbel_softmax(tau=0.2)`	Cross-entropy
Discrete (one-hot encoded)	`gumbel_softmax(tau=0.2)`	Cross-entropy
Discrete (binary encoded)	`sigmoid`	Binary cross-entropy

Related Pages

Implemented By

Implementation:Gretelai_Gretel_synthetics_ACTGANSynthesizer_Actual_Fit

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment