Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Gretelai Gretel synthetics DGAN Timeseries Generation

From Leeroopedia
Knowledge Sources
Domains Synthetic_Data, Time_Series, GAN
Last Updated 2026-02-14 19:00 GMT

Overview

End-to-end process for training a DoppelGANger (DGAN) generative adversarial network on time series data and generating synthetic sequences that preserve temporal correlations and attribute distributions.

Description

This workflow implements the DGAN (DoppelGANger) model for synthetic time series generation. DGAN is a GAN architecture specifically designed for multi-variate time series data with associated fixed attributes. It uses separate generator components for attributes and temporal features, a feature discriminator operating on full sequences, and an optional attribute discriminator. The model supports both numpy array and DataFrame interfaces, handles both wide-format (one row per example) and long-format (one row per time point) data, and can model variable-length sequences. Training uses the Wasserstein GAN with gradient penalty (WGAN-GP) objective.

Key outputs:

  • A trained DGAN model
  • Synthetic time series data (attributes and features) matching the training distribution

Usage

Execute this workflow when you have multi-variate time series data (with optional fixed attributes per series) and need to generate synthetic sequences that exhibit the same temporal patterns and correlations as the original data. Typical use cases include generating synthetic sensor readings, financial time series, network traffic data, or any domain where both the sequential feature dynamics and per-example attributes must be preserved.

Execution Steps

Step 1: Configuration

Create a DGANConfig specifying the model architecture and training parameters. Key parameters include max_sequence_len (the length of each time series), sample_len (the number of time points generated per RNN step, must evenly divide max_sequence_len), batch_size, number of epochs, and the number of discriminator rounds per generator update. Additional settings control attribute and feature noise dimensions, gradient penalty weight, and whether to use an attribute discriminator.

Key considerations:

  • sample_len must evenly divide max_sequence_len; smaller values give finer temporal control
  • The attribute_noise_dim and feature_noise_dim control the latent space dimensions
  • apply_feature_scaling and apply_example_scaling control normalization strategies
  • For variable-length sequences, provide features as a list of 2D arrays instead of a 3D array

Step 2: Data Preparation and Transformation

Convert input data (numpy arrays or DataFrames) into the internal representation. For numpy input, output metadata is inferred from the data by detecting continuous vs. discrete columns. For DataFrame input, a converter handles wide-to-numpy or long-to-numpy conversion, including grouping rows by example_id in long format. Continuous features and attributes are normalized (min-max or standard scaling), and discrete values are one-hot encoded.

Key considerations:

  • NaN values in continuous features are handled via linear interpolation
  • Discrete columns can be specified explicitly or auto-detected based on data type
  • Wide format expects one row per example with time series values spread across columns
  • Long format expects one row per time point with an example_id column for grouping
  • The converter stores the mapping needed for inverse transformation during generation

Step 3: Network Building

Construct the DGAN neural network components: an attribute generator, a feature generator (autoregressive LSTM), a feature discriminator, and an optional attribute discriminator. The attribute generator maps noise to fixed attributes and additional per-time-step context. The feature generator takes noise and attribute context to produce sequences in an autoregressive fashion, generating sample_len time points per LSTM step.

Key considerations:

  • The generator produces normalized/encoded data in the internal representation
  • The feature discriminator operates on full sequences concatenated with attributes
  • The attribute discriminator (if enabled) provides additional gradient signal for attribute quality
  • Mixed precision training is supported for faster computation on compatible hardware

Step 4: Adversarial Training

Train the DGAN using the WGAN-GP objective in a loop over epochs and batches. Each training iteration: (1) trains the discriminator(s) for a configurable number of rounds by comparing real data batches against generated fakes, applying gradient penalty for Lipschitz constraint enforcement; (2) trains the generator to fool the discriminator(s) while learning to produce realistic attributes and temporally coherent features. A progress callback reports epoch and batch statistics.

Key considerations:

  • The gradient penalty is computed by interpolating between real and generated samples
  • Separate Adam optimizers are used for generator and discriminator with configurable learning rates
  • The attribute discriminator loss is weighted and combined with the feature discriminator loss
  • Training supports an optional progress_callback for monitoring epoch/batch progress

Step 5: Synthetic Data Generation

Generate synthetic time series by sampling noise vectors and passing them through the trained generator. The generator produces attributes and feature sequences in the internal (normalized/encoded) representation, which are then inverse-transformed back to the original data space. For DataFrame interfaces, the data is further converted back to the original wide or long format with proper column names and types.

Key considerations:

  • Generation runs in batches, accumulating results until the requested number of examples is produced
  • Custom noise vectors can be provided for reproducibility or controlled generation
  • Discrete columns in the output are converted back to integer types where applicable
  • The generate_dataframe method returns data in the same format (wide or long) used during training

Execution Diagram

GitHub URL

Workflow Repository