Principle:Gretelai Gretel synthetics DGAN Network Building
| Knowledge Sources | |
|---|---|
| Domains | Synthetic_Data, Time_Series, GAN |
| Last Updated | 2026-02-14 19:00 GMT |
Overview
DGAN Network Building is the process of constructing the Generator and Discriminator neural network architectures that form the DoppelGANger GAN, using Output metadata to determine layer dimensions and activation functions.
Description
The DoppelGANger model employs a specialized three-component architecture: a Generator network (which itself contains sub-networks for attributes and features), a feature Discriminator, and an optional attribute Discriminator.
Generator Architecture: The Generator is composed of three sub-networks:
- Attribute Generator: A feed-forward MLP that maps attribute noise to attribute values. It consists of configurable layers of Linear + ReLU + BatchNorm1d, followed by an OutputDecoder that applies the appropriate activation for each output variable (Softmax for one-hot encoded, Sigmoid for binary encoded, Sigmoid or Tanh for continuous depending on normalization). If there are no attributes, this sub-network is None.
- Additional Attribute Generator: A second MLP with the same architecture that generates per-example midpoint and half-range values. Its input is the concatenation of the generated attributes (with gradient detached via
stop_gradient) and the original attribute noise. If there are no additional attributes, this sub-network is None.
- Feature Generator: An LSTM-based network. The input at each LSTM step is the concatenation of combined attributes (gradient-detached) and feature noise. The LSTM output passes through a Merger layer containing
sample_lenindependent OutputDecoders, each producing one time step of output. The final output is reshaped to (batch, max_sequence_len, feature_dim).
Feature Discriminator: A simple feed-forward MLP with 5 layers of 200 units each (Linear + ReLU), ending in a single linear output. Its input is the concatenation of attributes, additional attributes, and the flattened features.
Attribute Discriminator: An optional second MLP discriminator with the same architecture as the feature discriminator, but operating only on the concatenation of attributes and additional attributes (no features). It is enabled by default when attributes or additional attributes exist.
Noise Functions: Two noise sampling functions are created: one for attribute noise (2D Gaussian) and one for feature noise (3D Gaussian with dimensions batch x num_LSTM_steps x feature_noise_dim).
Weight Initialization: When forget_bias=True, the LSTM forget gate biases are initialized to 1.0 (matching TensorFlow 1 behavior) by walking the generator's parameters and replacing the bias_hh tensor's forget gate slice.
Usage
Network building happens automatically on the first call to train_numpy() or train_dataframe(), after Output metadata has been created from the training data. It can also be triggered at model initialization by providing both attribute_outputs and feature_outputs to the DGAN constructor, or during DGAN.load() deserialization.
Theoretical Basis
The architecture implements the DoppelGANger design from Lin et al. (2019):
Two-stage generation: Attributes are generated first by an MLP, then used to condition the LSTM that generates features. This separation allows the model to learn the joint distribution of static and temporal variables while respecting their structural difference (fixed vs. sequential).
Stop gradient: The attribute values are detached (stop_gradient) before being fed to the additional attribute generator and the feature generator LSTM. This prevents the feature generation gradients from flowing back through the attribute generator, allowing each sub-network to be optimized more independently.
LSTM with sample_len: Each LSTM cell produces sample_len time points through sample_len independent output decoders merged together. This provides a balance between purely autoregressive generation (sample_len=1) and single-step generation (sample_len=max_sequence_len). The number of LSTM steps is:
num_steps = max_sequence_len / sample_len
Output-specific activations: The OutputDecoder applies the mathematically appropriate activation for each variable type:
- Softmax for one-hot encoded discrete variables (outputs sum to 1)
- Sigmoid for binary encoded discrete variables (each bit independently in [0,1])
- Sigmoid for ZERO_ONE continuous variables (output in [0,1])
- Tanh for MINUSONE_ONE continuous variables (output in [-1,1])
Wasserstein discriminator: The discriminators have no final activation (raw linear output), as required by the Wasserstein distance formulation.