Heuristic:Junyanz Pytorch CycleGAN and pix2pix Adam Beta1 Half
| Knowledge Sources | |
|---|---|
| Domains | GAN_Training, Optimization |
| Last Updated | 2026-02-09 16:00 GMT |
Overview
Use Adam optimizer with beta1=0.5 instead of the standard 0.9 for more stable GAN training dynamics.
Description
The default Adam optimizer beta1 (momentum) in this codebase is set to 0.5, which is non-standard compared to the typical default of 0.9. This lower momentum value is inherited from the DCGAN training recipe and has become a standard practice in GAN training. The lower beta1 reduces the influence of historical gradient information, making the optimizer more responsive to recent gradients. This is beneficial for the adversarial minimax game where optimal gradients change rapidly as generator and discriminator co-evolve. The default learning rate is 0.0002.
Usage
This heuristic is applied by default. Keep `--beta1 0.5` and `--lr 0.0002` unless you have a specific reason to change them. These values have been empirically validated across the CycleGAN and pix2pix experiments.
The Insight (Rule of Thumb)
- Action: Set Adam `beta1 = 0.5` and `lr = 0.0002` for both generator and discriminator optimizers.
- Value: `--beta1 0.5`, `--lr 0.0002`, `beta2 = 0.999` (PyTorch default, not configurable via CLI).
- Trade-off: Lower beta1 makes training less smooth but more responsive to adversarial dynamics. Standard beta1=0.9 can cause instability in GAN training.
Reasoning
In GAN training, the loss landscape changes rapidly because both networks are being updated simultaneously. A high momentum (beta1=0.9) causes the optimizer to "remember" stale gradient directions that may no longer be relevant, potentially causing oscillation or divergence. A lower beta1=0.5 gives more weight to current gradients, helping both networks adapt quickly to each other's updates. This setting was popularized by the DCGAN paper (Radford et al., 2015) and has become standard practice for GAN training.
Code evidence from `options/train_options.py:27-28`:
parser.add_argument('--beta1', type=float, default=0.5,
help='momentum term of adam')
parser.add_argument('--lr', type=float, default=0.0002,
help='initial learning rate for adam')
Optimizer construction from `models/cycle_gan_model.py:96-97`:
self.optimizer_G = torch.optim.Adam(
itertools.chain(self.netG_A.parameters(), self.netG_B.parameters()),
lr=opt.lr, betas=(opt.beta1, 0.999))
self.optimizer_D = torch.optim.Adam(
itertools.chain(self.netD_A.parameters(), self.netD_B.parameters()),
lr=opt.lr, betas=(opt.beta1, 0.999))