Heuristic:Junyanz Pytorch CycleGAN and pix2pix Instance Norm for Multi GPU
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Training, Optimization |
| Last Updated | 2026-02-09 16:00 GMT |
Overview
Use instance normalization or synchronized batch normalization instead of standard batch normalization for multi-GPU DDP training to avoid inconsistent statistics.
Description
Standard batch normalization (`nn.BatchNorm2d`) computes mean and variance over the local mini-batch on each GPU independently. In multi-GPU DDP training, this means each GPU sees different batch statistics, leading to inconsistent normalization and degraded training quality. The solution is to either use instance normalization (`--norm instance`), which normalizes per-image and is unaffected by batch distribution, or synchronized batch normalization (`--norm syncbatch`), which aggregates statistics across all GPUs via `nn.SyncBatchNorm`.
Usage
Apply this heuristic whenever training with multiple GPUs. For CycleGAN, instance normalization is already the default (`--norm instance`). For pix2pix (which defaults to `--norm batch`), switch to `--norm syncbatch` or `--norm instance` when using DDP.
The Insight (Rule of Thumb)
- Action: Set `--norm instance` or `--norm syncbatch` for multi-GPU training.
- Value: `--norm instance` (CycleGAN default), `--norm syncbatch` (synchronized alternative).
- Trade-off: Instance normalization ignores inter-image statistics which may matter for some tasks. SyncBatchNorm adds communication overhead between GPUs but preserves batch statistics fidelity.
- Incompatibility: `--norm batch` with DDP produces incorrect results due to non-shared statistics.
Reasoning
In DDP, each process runs on its own GPU with a portion of the batch. Standard BatchNorm computes statistics only from the local portion, meaning a batch_size of 4 split across 4 GPUs gives each GPU a batch of 1 for normalization, which is extremely noisy. SyncBatchNorm solves this by communicating statistics across GPUs, while InstanceNorm sidesteps the issue entirely by normalizing per-instance.
From `README.md`:
"To train a model on multiple GPUs, please use torchrun --nproc_per_node=4 train.py ... We also need to use synchronized batchnorm by setting --norm sync_batch (or --norm sync_instance for instance normalization). The --norm batch is not compatible with DDP."
From `docs/qa.md`:
"We also recommend that you use the instance normalization for multi-GPU training by setting --norm instance. The current batch normalization might not work for multi-GPUs as the batchnorm parameters are not shared across different GPUs."
Code evidence for SyncBatchNorm support from `models/networks.py:29-30`:
elif norm_type == "syncbatch":
norm_layer = functools.partial(
nn.SyncBatchNorm, affine=True, track_running_stats=True
)
Related Pages
- Implementation:Junyanz_Pytorch_CycleGAN_and_pix2pix_Define_G_and_D
- Implementation:Junyanz_Pytorch_CycleGAN_and_pix2pix_CycleGANModel_Optimize_Parameters
- Implementation:Junyanz_Pytorch_CycleGAN_and_pix2pix_Pix2PixModel_Optimize_Parameters
- Principle:Junyanz_Pytorch_CycleGAN_and_pix2pix_GAN_Network_Architecture