Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Gretelai Gretel synthetics GAN Adversarial Training

From Leeroopedia
Knowledge Sources
Domains Synthetic_Data, GAN, Tabular_Data
Last Updated 2026-02-14 19:00 GMT

Overview

GAN adversarial training is the iterative optimization process in which a generator network learns to produce realistic synthetic data by competing against a discriminator network that learns to distinguish real from generated samples.

Description

In a Generative Adversarial Network, two neural networks are trained simultaneously in a minimax game. The generator receives random noise (and optionally a conditional vector) and outputs synthetic data samples. The discriminator receives both real data samples and generated samples and outputs a scalar score indicating how "real" the input appears. Training alternates between updating the discriminator to better distinguish real from fake, and updating the generator to better fool the discriminator.

ACTGAN extends the standard GAN training with several important design choices:

  • WGAN-GP loss: Instead of the original GAN binary cross-entropy loss, ACTGAN uses the Wasserstein distance with gradient penalty. The discriminator loss is -(mean(y_real) - mean(y_fake)) plus a gradient penalty term. This provides more stable training gradients and avoids mode collapse.
  • Packing (PAC): Multiple samples (controlled by the pac parameter) are grouped together as a single input to the discriminator. This helps the discriminator detect mode collapse by seeing multiple samples simultaneously.
  • Residual generator: The generator uses residual layers where each layer's output is concatenated with its input, allowing information to flow directly through the network and enabling the generator to build increasingly complex representations.
  • Conditional training: A conditional vector is concatenated with the noise vector for the generator and with the data for the discriminator, allowing the model to learn column-specific distributions and enabling conditional generation at inference time.
  • Reconstruction loss: In addition to the adversarial loss, the generator incurs a reconstruction loss that penalizes mismatches between the generated data and the conditional vector. This focuses the generator on accurately reproducing the conditioned columns.

Usage

Adversarial training is automatically invoked by ACTGAN.fit(data) after data transformation is complete. Key hyperparameters that affect training behavior include:

  • epochs: Number of full passes over the training data
  • batch_size: Number of samples per training step
  • discriminator_steps: Number of discriminator updates per generator update (default 1, WGAN paper suggests 5)
  • generator_lr / discriminator_lr: Learning rates for the Adam optimizers
  • reconstruction_loss_coef: Weight of the reconstruction loss relative to the adversarial loss
  • conditional_vector_type: SINGLE_DISCRETE (one discrete column per step) or ANYWAY (any combination of columns)

Theoretical Basis

The training follows the WGAN-GP (Wasserstein GAN with Gradient Penalty) formulation:

Discriminator Update

For each discriminator step:
    1. Sample noise z ~ N(0, I) of shape [batch_size, embedding_dim]
    2. Sample conditional vectors (fake_cond, real_cond, column_mask)
    3. Generate fake data: fake = G(z, fake_cond)
    4. Apply activation functions to get fakeact
    5. Score fake: y_fake = D(fakeact, fake_cond)
    6. Score real: y_real = D(real_encoded, real_cond)
    7. Gradient penalty:
       alpha ~ U(0,1)
       interpolated = alpha * real + (1-alpha) * fake
       GP = lambda * (||grad(D(interpolated))||_2 - 1)^2
    8. Loss_D = -(mean(y_real) - mean(y_fake)) + GP
    9. Update D with Adam optimizer

Generator Update

1. Sample new noise z ~ N(0, I)
2. Sample new conditional vectors
3. Generate fake data: fake, fakeact = G(z, fake_cond)
4. Score fake: y_fake = D(fakeact, fake_cond)

For SINGLE_DISCRETE conditional vectors:
    Loss_R = CrossEntropy(fake[:, discrete_cols], argmax(fake_cond[:, discrete_cols]))
For ANYWAY conditional vectors:
    Loss_R = sum of per-column losses (MSE for continuous, BCE for binary, CE for one-hot)
             weighted by column_mask

5. Loss_G = -mean(y_fake) + reconstruction_loss_coef * Loss_R
6. Update G with Adam optimizer

Network Architectures

Generator (with Residual layers):

Input: [noise, cond_vec] of dim (embedding_dim + cond_vec_dim)
For each d in generator_dim:
    h = concat(ReLU(BatchNorm(Linear(input, d))), input)
    input = h   # dimension grows by d at each layer
Output: Linear(h, data_dim)

Discriminator (with PAC packing):

Input: [data, cond_vec] reshaped to (batch_size/pac, pac * input_dim)
For each d in discriminator_dim:
    h = Dropout(0.5, LeakyReLU(0.2, Linear(input, d)))
    input = h
Output: Linear(h, 1)

Activation Functions

Different column types use different activation functions on the generator output:

Column Type Activation Loss Function
Continuous (normalized value) tanh MSE
Continuous (component vector) gumbel_softmax(tau=0.2) Cross-entropy
Discrete (one-hot encoded) gumbel_softmax(tau=0.2) Cross-entropy
Discrete (binary encoded) sigmoid Binary cross-entropy

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment