Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Gretelai Gretel synthetics DGAN Configuration

From Leeroopedia
Knowledge Sources
Domains Synthetic_Data, Time_Series, GAN
Last Updated 2026-02-14 19:00 GMT

Overview

DGAN Configuration encapsulates the complete set of hyperparameters and behavioral flags that control how a DoppelGANger generative adversarial network is structured, trained, and generates synthetic time series data.

Description

A time series GAN such as DoppelGANger requires careful coordination of dozens of interrelated settings. These settings fall into several logical groups:

Model Structure parameters define the neural network architecture. The sequence length (max_sequence_len) and LSTM cell output length (sample_len) must satisfy the divisibility constraint max_sequence_len % sample_len == 0, since the generator produces sample_len time points per LSTM step across max_sequence_len / sample_len steps. Noise dimensions control the expressiveness of the latent space, while layer counts and unit counts control network capacity for both the attribute MLP and feature LSTM.

Data Transformation parameters govern how raw user data is converted to the internal representation. The normalization mode determines whether continuous variables are scaled to [0,1] (sigmoid activation) or [-1,1] (tanh activation). Feature scaling controls whether the model normalizes variables globally, and example scaling adds per-example midpoint and half-range as additional generated attributes, which is critical for datasets where time series ranges vary widely across examples.

Loss Function parameters control the Wasserstein loss with gradient penalty (WGAN-GP). The gradient penalty coefficient (default 10.0) enforces the Lipschitz constraint. A separate attribute discriminator with its own gradient penalty coefficient and loss weighting coefficient encourages the model to learn accurate attribute distributions.

Training parameters include separate Adam optimizer settings (learning rate, beta1) for the generator, feature discriminator, and attribute discriminator, as well as batch size, epoch count, and the number of discriminator and generator rounds per batch.

Hardware flags enable CUDA acceleration and mixed precision training for reduced memory usage.

Usage

Use DGAN Configuration whenever initializing a DoppelGANger model. The configuration must be specified before any training begins and its values determine the network architecture that gets built. The two required parameters are max_sequence_len (matching the time series length in training data) and sample_len (a divisor of max_sequence_len). All other parameters have sensible defaults.

Theoretical Basis

The configuration embodies the DoppelGANger architecture from the paper Generating High-fidelity, Synthetic Time Series Datasets with DoppelGANger (Lin et al., 2019). Key theoretical elements captured in the configuration include:

Sequence generation via LSTM: The generator uses an LSTM network where each cell produces sample_len time points. The total number of LSTM steps is:

num_steps = max_sequence_len / sample_len

Separate attribute and feature generation: Attributes (fixed per example) are generated by an MLP, then concatenated with noise to drive the LSTM for feature (time series) generation. This two-stage design decouples static and temporal characteristics.

WGAN-GP loss: The Wasserstein distance with gradient penalty is used for stable training:

L_D = E[D(G(z))] - E[D(x)] + lambda * E[(||grad_x_hat D(x_hat)||_2 - 1)^2]

where lambda is the gradient_penalty_coef and x_hat is an interpolation between real and generated data.

Attribute discriminator: An optional second discriminator operates only on attributes (and additional attributes), with its own gradient penalty coefficient. The combined generator loss is:

L_G = -E[D(G(z))] + attribute_loss_coef * (-E[D_attr(G(z)_attr)])

Per-example scaling: When apply_example_scaling is True, the model generates midpoint and half-range as additional attributes for each continuous feature, allowing the model to capture widely varying ranges across examples.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment