Implementation:Gretelai Gretel synthetics DGAN Train Numpy
| Knowledge Sources | |
|---|---|
| Domains | Synthetic_Data, Time_Series, GAN |
| Last Updated | 2026-02-14 19:00 GMT |
Overview
Concrete tool for preparing and ingesting time series training data into the DGAN model provided by the gretel-synthetics library.
Description
The DGAN.train_numpy() method is the primary entry point for training a DoppelGANger model on numpy array data. It orchestrates a multi-step pipeline: (1) automatic type detection for attributes and features if types are not specified, (2) creation of Output metadata via create_outputs_from_data(), (3) model building via _build() on first call, (4) NaN validation and linear interpolation for continuous features, (5) transformation of features via transform_features() including scaling, encoding, and per-example attribute extraction, (6) transformation of attributes via transform_attributes(), (7) wrapping into a TensorDataset, and (8) delegation to the internal _train() method.
The companion DGAN.train_dataframe() method accepts pandas DataFrames in "wide" or "long" format, converts them to numpy arrays using a _DataFrameConverter, and then delegates to train_numpy().
Usage
Call train_numpy() with a 3D numpy array (or list of 2D arrays for variable-length sequences) of features and an optional 2D array of attributes. On the first call, the model structure is automatically determined from the data. Use train_dataframe() when starting from a pandas DataFrame.
Code Reference
Source Location
- Repository: gretel-synthetics
- File:
src/gretel_synthetics/timeseries_dgan/dgan.py - Lines: 174-396 (train_numpy), 398-536 (train_dataframe)
- File:
src/gretel_synthetics/timeseries_dgan/transformations.py - Lines: 367-439 (create_outputs_from_data), 550-570 (transform_attributes), 612-709 (transform_features)
Signature
def train_numpy(
self,
features: Union[np.ndarray, list[np.ndarray]],
feature_types: Optional[List[OutputType]] = None,
attributes: Optional[np.ndarray] = None,
attribute_types: Optional[List[OutputType]] = None,
progress_callback: Optional[Callable[[ProgressInfo], None]] = None,
) -> None:
def train_dataframe(
self,
df: pd.DataFrame,
attribute_columns: Optional[List[str]] = None,
feature_columns: Optional[List[str]] = None,
example_id_column: Optional[str] = None,
time_column: Optional[str] = None,
discrete_columns: Optional[List[str]] = None,
df_style: DfStyle = DfStyle.WIDE,
progress_callback: Optional[Callable[[ProgressInfo], None]] = None,
) -> None:
Import
from gretel_synthetics.timeseries_dgan.dgan import DGAN
from gretel_synthetics.timeseries_dgan.config import DGANConfig, OutputType
I/O Contract
Inputs (train_numpy)
| Name | Type | Required | Description |
|---|---|---|---|
| features | np.ndarray or list[np.ndarray] | Yes | 3D array of shape (examples, max_sequence_len, num_features) or list of 2D arrays for variable-length sequences |
| feature_types | Optional[List[OutputType]] | No | OutputType.CONTINUOUS or OutputType.DISCRETE per feature; auto-detected if None |
| attributes | Optional[np.ndarray] | No | 2D array of shape (examples, num_attributes); None if no attributes |
| attribute_types | Optional[List[OutputType]] | No | OutputType per attribute; auto-detected if None |
| progress_callback | Optional[Callable[[ProgressInfo], None]] | No | Callback invoked after each training batch with progress information |
Inputs (train_dataframe)
| Name | Type | Required | Description |
|---|---|---|---|
| df | pd.DataFrame | Yes | Training data in wide or long format |
| attribute_columns | Optional[List[str]] | No | Column names for attributes; must be disjoint from feature_columns |
| feature_columns | Optional[List[str]] | No | Column names for features; defaults to all non-attribute columns |
| example_id_column | Optional[str] | No | Column to split long-format data into examples |
| time_column | Optional[str] | No | Column used to sort long-format data by time |
| discrete_columns | Optional[List[str]] | No | Columns to treat as discrete (one-hot or binary encoded) |
| df_style | DfStyle | No (default WIDE) | Format of the DataFrame: DfStyle.WIDE or DfStyle.LONG |
| progress_callback | Optional[Callable[[ProgressInfo], None]] | No | Callback invoked after each training batch |
Outputs
| Name | Type | Description |
|---|---|---|
| (none) | None | Both methods return None; the DGAN model is trained in-place |
Usage Examples
Basic Example
import numpy as np
from gretel_synthetics.timeseries_dgan.dgan import DGAN
from gretel_synthetics.timeseries_dgan.config import DGANConfig
attributes = np.random.rand(10000, 3)
features = np.random.rand(10000, 20, 2)
config = DGANConfig(
max_sequence_len=20,
sample_len=5,
batch_size=1000,
epochs=10,
)
model = DGAN(config)
model.train_numpy(attributes=attributes, features=features)
DataFrame Example
import pandas as pd
from gretel_synthetics.timeseries_dgan.dgan import DGAN
from gretel_synthetics.timeseries_dgan.config import DGANConfig, DfStyle
config = DGANConfig(max_sequence_len=20, sample_len=5, epochs=10)
model = DGAN(config)
# Long-format DataFrame with example_id and time columns
model.train_dataframe(
df=df,
attribute_columns=["sector", "country"],
feature_columns=["open", "high", "low", "close"],
example_id_column="stock_id",
time_column="date",
discrete_columns=["sector", "country"],
df_style=DfStyle.LONG,
)
Explicit Type Annotations
from gretel_synthetics.timeseries_dgan.config import OutputType
model.train_numpy(
features=features,
feature_types=[OutputType.CONTINUOUS, OutputType.DISCRETE],
attributes=attributes,
attribute_types=[OutputType.DISCRETE, OutputType.CONTINUOUS, OutputType.CONTINUOUS],
)