Implementation:Gretelai Gretel synthetics DGAN Train Numpy

Knowledge Sources	gretel-synthetics
Domains	Synthetic_Data, Time_Series, GAN
Last Updated	2026-02-14 19:00 GMT

Overview

Concrete tool for preparing and ingesting time series training data into the DGAN model provided by the gretel-synthetics library.

Description

The DGAN.train_numpy() method is the primary entry point for training a DoppelGANger model on numpy array data. It orchestrates a multi-step pipeline: (1) automatic type detection for attributes and features if types are not specified, (2) creation of Output metadata via create_outputs_from_data(), (3) model building via _build() on first call, (4) NaN validation and linear interpolation for continuous features, (5) transformation of features via transform_features() including scaling, encoding, and per-example attribute extraction, (6) transformation of attributes via transform_attributes(), (7) wrapping into a TensorDataset, and (8) delegation to the internal _train() method.

The companion DGAN.train_dataframe() method accepts pandas DataFrames in "wide" or "long" format, converts them to numpy arrays using a _DataFrameConverter, and then delegates to train_numpy().

Usage

Call train_numpy() with a 3D numpy array (or list of 2D arrays for variable-length sequences) of features and an optional 2D array of attributes. On the first call, the model structure is automatically determined from the data. Use train_dataframe() when starting from a pandas DataFrame.

Code Reference

Source Location

Repository: gretel-synthetics
File: src/gretel_synthetics/timeseries_dgan/dgan.py
Lines: 174-396 (train_numpy), 398-536 (train_dataframe)
File: src/gretel_synthetics/timeseries_dgan/transformations.py
Lines: 367-439 (create_outputs_from_data), 550-570 (transform_attributes), 612-709 (transform_features)

Signature

def train_numpy(
    self,
    features: Union[np.ndarray, list[np.ndarray]],
    feature_types: Optional[List[OutputType]] = None,
    attributes: Optional[np.ndarray] = None,
    attribute_types: Optional[List[OutputType]] = None,
    progress_callback: Optional[Callable[[ProgressInfo], None]] = None,
) -> None:

def train_dataframe(
    self,
    df: pd.DataFrame,
    attribute_columns: Optional[List[str]] = None,
    feature_columns: Optional[List[str]] = None,
    example_id_column: Optional[str] = None,
    time_column: Optional[str] = None,
    discrete_columns: Optional[List[str]] = None,
    df_style: DfStyle = DfStyle.WIDE,
    progress_callback: Optional[Callable[[ProgressInfo], None]] = None,
) -> None:

Import

from gretel_synthetics.timeseries_dgan.dgan import DGAN
from gretel_synthetics.timeseries_dgan.config import DGANConfig, OutputType

I/O Contract

Inputs (train_numpy)

Name	Type	Required	Description
features	np.ndarray or list[np.ndarray]	Yes	3D array of shape (examples, max_sequence_len, num_features) or list of 2D arrays for variable-length sequences
feature_types	Optional[List[OutputType]]	No	OutputType.CONTINUOUS or OutputType.DISCRETE per feature; auto-detected if None
attributes	Optional[np.ndarray]	No	2D array of shape (examples, num_attributes); None if no attributes
attribute_types	Optional[List[OutputType]]	No	OutputType per attribute; auto-detected if None
progress_callback	Optional[Callable[[ProgressInfo], None]]	No	Callback invoked after each training batch with progress information

Inputs (train_dataframe)

Name	Type	Required	Description
df	pd.DataFrame	Yes	Training data in wide or long format
attribute_columns	Optional[List[str]]	No	Column names for attributes; must be disjoint from feature_columns
feature_columns	Optional[List[str]]	No	Column names for features; defaults to all non-attribute columns
example_id_column	Optional[str]	No	Column to split long-format data into examples
time_column	Optional[str]	No	Column used to sort long-format data by time
discrete_columns	Optional[List[str]]	No	Columns to treat as discrete (one-hot or binary encoded)
df_style	DfStyle	No (default WIDE)	Format of the DataFrame: DfStyle.WIDE or DfStyle.LONG
progress_callback	Optional[Callable[[ProgressInfo], None]]	No	Callback invoked after each training batch

Outputs

Name	Type	Description
(none)	None	Both methods return None; the DGAN model is trained in-place

Usage Examples

Basic Example

import numpy as np
from gretel_synthetics.timeseries_dgan.dgan import DGAN
from gretel_synthetics.timeseries_dgan.config import DGANConfig

attributes = np.random.rand(10000, 3)
features = np.random.rand(10000, 20, 2)

config = DGANConfig(
    max_sequence_len=20,
    sample_len=5,
    batch_size=1000,
    epochs=10,
)

model = DGAN(config)
model.train_numpy(attributes=attributes, features=features)

DataFrame Example

import pandas as pd
from gretel_synthetics.timeseries_dgan.dgan import DGAN
from gretel_synthetics.timeseries_dgan.config import DGANConfig, DfStyle

config = DGANConfig(max_sequence_len=20, sample_len=5, epochs=10)
model = DGAN(config)

# Long-format DataFrame with example_id and time columns
model.train_dataframe(
    df=df,
    attribute_columns=["sector", "country"],
    feature_columns=["open", "high", "low", "close"],
    example_id_column="stock_id",
    time_column="date",
    discrete_columns=["sector", "country"],
    df_style=DfStyle.LONG,
)

Explicit Type Annotations

from gretel_synthetics.timeseries_dgan.config import OutputType

model.train_numpy(
    features=features,
    feature_types=[OutputType.CONTINUOUS, OutputType.DISCRETE],
    attributes=attributes,
    attribute_types=[OutputType.DISCRETE, OutputType.CONTINUOUS, OutputType.CONTINUOUS],
)

Related Pages

Implements Principle

Principle:Gretelai_Gretel_synthetics_Timeseries_Data_Preparation

Requires Environment

Environment:Gretelai_Gretel_synthetics_PyTorch_CUDA_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment