Heuristic:Gretelai Gretel synthetics Batch Size Divisibility Constraints
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Tabular_Data, Time_Series |
| Last Updated | 2026-02-14 19:00 GMT |
Overview
Critical batch size constraints for ACTGAN (must be divisible by 2 and pac) and DGAN (max_sequence_len must be divisible by sample_len) that cause hard failures if violated.
Description
Both the ACTGAN and DGAN models impose strict divisibility requirements on batch and sequence parameters. For ACTGAN, the batch size must be even (divisible by 2) and also divisible by the `pac` parameter (default 10), which controls the PacGAN grouping size. For DGAN, the `max_sequence_len` must be exactly divisible by `sample_len`, as each LSTM cell generates `sample_len` time steps and the total sequence is composed of `max_sequence_len / sample_len` such cells. Violating these constraints raises immediate `ValueError` or `ParameterError` exceptions.
Usage
Apply this heuristic whenever configuring batch_size for ACTGAN or max_sequence_len / sample_len for DGAN. These are hard constraints checked at initialization time, not soft recommendations.
The Insight (Rule of Thumb)
- ACTGAN Action: Set `batch_size` to a value divisible by both 2 and `pac` (default pac=10). Safe defaults: 500 (default), 100, 200, 1000.
- DGAN Action: Set `max_sequence_len` to an exact multiple of `sample_len`. Example: max_sequence_len=24, sample_len=4 (24/4=6 LSTM cells).
- Trade-off: These are hard constraints with no workaround. The batch_size/pac relationship affects PacGAN's ability to detect mode collapse. The sequence_len/sample_len ratio determines the number of LSTM generation steps.
Reasoning
ACTGAN PacGAN: The PacGAN technique groups `pac` samples together before feeding them to the discriminator. This requires the batch to be evenly divisible by `pac` so each group is complete. The batch must also be even for the generator/discriminator split in adversarial training.
DGAN LSTM: The generator uses an LSTM that produces `sample_len` time steps per cell. The total sequence length must be composed of an integer number of these cells, hence the divisibility requirement. A `sample_len` that does not divide `max_sequence_len` would result in incomplete or truncated generation at the sequence boundary.
Code Evidence
ACTGAN batch_size validation from `actgan/actgan.py:280-284`:
if batch_size % 2 != 0:
raise ValueError("`batch_size` must be divisible by 2")
if batch_size % pac != 0:
raise ValueError("`batch_size` must be divisible by `pac` (defaults to 10)")
DGAN sequence_len validation from `timeseries_dgan/dgan.py:160-163`:
if config.max_sequence_len % config.sample_len != 0:
raise ParameterError(
f"max_sequence_len={config.max_sequence_len} must be divisible by sample_len={config.sample_len}"
)