Principle:Gretelai Gretel synthetics Timeseries Data Generation

Knowledge Sources	gretel-synthetics DoppelGANger
Domains	Synthetic_Data, Time_Series, GAN
Last Updated	2026-02-14 19:00 GMT

Overview

Timeseries Data Generation is the process of sampling noise vectors, passing them through a trained DoppelGANger Generator, and inverse-transforming the internal representation back to the original data space to produce synthetic time series examples.

Description

After a DoppelGANger model has been trained, it can generate an arbitrary number of synthetic examples. The generation pipeline reverses the data preparation pipeline:

Noise Sampling: Two random noise tensors are drawn from standard normal distributions. Attribute noise has shape (batch_size, attribute_noise_dim) and feature noise has shape (batch_size, max_sequence_len/sample_len, feature_noise_dim). For large generation requests, multiple batches are generated and concatenated before truncating to the exact requested count.

Generator Forward Pass: The generator's forward method produces a 3-element tuple: (attributes, additional_attributes, features), all in the internal encoded representation. The generator is set to evaluation mode (disabling dropout and using running batch normalization statistics) before generation.

Inverse Transform of Attributes: For each attribute Output, the internal representation is decoded back to the original space. One-hot encoded columns are inverted via argmax selection, binary encoded columns are thresholded at 0.5 and decoded, and continuous columns are rescaled from [0,1] or [-1,1] back to the original range using the stored global min/max.

Inverse Transform of Features: Feature inversion follows a similar process but includes an additional step for per-example scaling. When apply_example_scaling was used during training, the generated additional attributes (midpoint and half-range) are used to reverse the per-example normalization before the global inverse scaling:

Recover per-example min/max from midpoint and half-range: min = midpoint - half_range, max = midpoint + half_range
Invert per-example scaling using these per-example min/max values
Invert global scaling using the stored global min/max

Discrete features are decoded just as with attributes (argmax for one-hot, thresholding for binary).

DataFrame Conversion: When using generate_dataframe(), the numpy arrays from generate_numpy() are passed through the stored _DataFrameConverter to reconstruct a DataFrame in the same format as the training data, including column names, example IDs, time columns, and proper data types.

Usage

Call generate_numpy(n) to produce n synthetic examples as numpy arrays, or generate_dataframe(n) to get a pandas DataFrame. The model must be trained before generation. For reproducibility or controlled generation, explicit noise tensors can be passed instead of specifying n.

Theoretical Basis

The generation process implements the inference phase of the DoppelGANger model:

Latent Space Sampling: The generator has learned a mapping from the noise space to the data distribution. By sampling from the same noise distribution used during training (standard normal), the generator produces novel examples that exhibit the statistical properties of the training data.

Two-stage generation: The attribute MLP generates static variables first, which then condition the feature LSTM. This ensures that the temporal characteristics of generated features are consistent with the generated attributes (e.g., a generated "high bandwidth" attribute leads to feature values in the appropriate range).

Per-example denormalization: When example scaling is used, the generator produces midpoint (m) and half-range (h) as additional attributes. The actual feature values are recovered as:

x_original = inverse_global_scale(inverse_example_scale(x_generated, m - h, m + h))

For ZERO_ONE normalization, the inverse scaling is:

x = x_scaled * (max - min) + min

For MINUSONE_ONE normalization:

x = ((x_scaled + 1) / 2) * (max - min) + min

Batch generation: For large n, generating in batches of batch_size avoids excessive memory usage. The results are concatenated and truncated to exactly n examples.

Related Pages

Implemented By

Implementation:Gretelai_Gretel_synthetics_DGAN_Generate_Numpy

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment