Principle:Gretelai Gretel synthetics GAN Adversarial Training
| Knowledge Sources | |
|---|---|
| Domains | Synthetic_Data, GAN, Tabular_Data |
| Last Updated | 2026-02-14 19:00 GMT |
Overview
GAN adversarial training is the iterative optimization process in which a generator network learns to produce realistic synthetic data by competing against a discriminator network that learns to distinguish real from generated samples.
Description
In a Generative Adversarial Network, two neural networks are trained simultaneously in a minimax game. The generator receives random noise (and optionally a conditional vector) and outputs synthetic data samples. The discriminator receives both real data samples and generated samples and outputs a scalar score indicating how "real" the input appears. Training alternates between updating the discriminator to better distinguish real from fake, and updating the generator to better fool the discriminator.
ACTGAN extends the standard GAN training with several important design choices:
- WGAN-GP loss: Instead of the original GAN binary cross-entropy loss, ACTGAN uses the Wasserstein distance with gradient penalty. The discriminator loss is
-(mean(y_real) - mean(y_fake))plus a gradient penalty term. This provides more stable training gradients and avoids mode collapse. - Packing (PAC): Multiple samples (controlled by the
pacparameter) are grouped together as a single input to the discriminator. This helps the discriminator detect mode collapse by seeing multiple samples simultaneously. - Residual generator: The generator uses residual layers where each layer's output is concatenated with its input, allowing information to flow directly through the network and enabling the generator to build increasingly complex representations.
- Conditional training: A conditional vector is concatenated with the noise vector for the generator and with the data for the discriminator, allowing the model to learn column-specific distributions and enabling conditional generation at inference time.
- Reconstruction loss: In addition to the adversarial loss, the generator incurs a reconstruction loss that penalizes mismatches between the generated data and the conditional vector. This focuses the generator on accurately reproducing the conditioned columns.
Usage
Adversarial training is automatically invoked by ACTGAN.fit(data) after data transformation is complete. Key hyperparameters that affect training behavior include:
- epochs: Number of full passes over the training data
- batch_size: Number of samples per training step
- discriminator_steps: Number of discriminator updates per generator update (default 1, WGAN paper suggests 5)
- generator_lr / discriminator_lr: Learning rates for the Adam optimizers
- reconstruction_loss_coef: Weight of the reconstruction loss relative to the adversarial loss
- conditional_vector_type: SINGLE_DISCRETE (one discrete column per step) or ANYWAY (any combination of columns)
Theoretical Basis
The training follows the WGAN-GP (Wasserstein GAN with Gradient Penalty) formulation:
Discriminator Update
For each discriminator step:
1. Sample noise z ~ N(0, I) of shape [batch_size, embedding_dim]
2. Sample conditional vectors (fake_cond, real_cond, column_mask)
3. Generate fake data: fake = G(z, fake_cond)
4. Apply activation functions to get fakeact
5. Score fake: y_fake = D(fakeact, fake_cond)
6. Score real: y_real = D(real_encoded, real_cond)
7. Gradient penalty:
alpha ~ U(0,1)
interpolated = alpha * real + (1-alpha) * fake
GP = lambda * (||grad(D(interpolated))||_2 - 1)^2
8. Loss_D = -(mean(y_real) - mean(y_fake)) + GP
9. Update D with Adam optimizer
Generator Update
1. Sample new noise z ~ N(0, I)
2. Sample new conditional vectors
3. Generate fake data: fake, fakeact = G(z, fake_cond)
4. Score fake: y_fake = D(fakeact, fake_cond)
For SINGLE_DISCRETE conditional vectors:
Loss_R = CrossEntropy(fake[:, discrete_cols], argmax(fake_cond[:, discrete_cols]))
For ANYWAY conditional vectors:
Loss_R = sum of per-column losses (MSE for continuous, BCE for binary, CE for one-hot)
weighted by column_mask
5. Loss_G = -mean(y_fake) + reconstruction_loss_coef * Loss_R
6. Update G with Adam optimizer
Network Architectures
Generator (with Residual layers):
Input: [noise, cond_vec] of dim (embedding_dim + cond_vec_dim)
For each d in generator_dim:
h = concat(ReLU(BatchNorm(Linear(input, d))), input)
input = h # dimension grows by d at each layer
Output: Linear(h, data_dim)
Discriminator (with PAC packing):
Input: [data, cond_vec] reshaped to (batch_size/pac, pac * input_dim)
For each d in discriminator_dim:
h = Dropout(0.5, LeakyReLU(0.2, Linear(input, d)))
input = h
Output: Linear(h, 1)
Activation Functions
Different column types use different activation functions on the generator output:
| Column Type | Activation | Loss Function |
|---|---|---|
| Continuous (normalized value) | tanh |
MSE |
| Continuous (component vector) | gumbel_softmax(tau=0.2) |
Cross-entropy |
| Discrete (one-hot encoded) | gumbel_softmax(tau=0.2) |
Cross-entropy |
| Discrete (binary encoded) | sigmoid |
Binary cross-entropy |