Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Heuristic:Junyanz Pytorch CycleGAN and pix2pix Instance Norm for Multi GPU

From Leeroopedia



Knowledge Sources
Domains Distributed_Training, Optimization
Last Updated 2026-02-09 16:00 GMT

Overview

Use instance normalization or synchronized batch normalization instead of standard batch normalization for multi-GPU DDP training to avoid inconsistent statistics.

Description

Standard batch normalization (`nn.BatchNorm2d`) computes mean and variance over the local mini-batch on each GPU independently. In multi-GPU DDP training, this means each GPU sees different batch statistics, leading to inconsistent normalization and degraded training quality. The solution is to either use instance normalization (`--norm instance`), which normalizes per-image and is unaffected by batch distribution, or synchronized batch normalization (`--norm syncbatch`), which aggregates statistics across all GPUs via `nn.SyncBatchNorm`.

Usage

Apply this heuristic whenever training with multiple GPUs. For CycleGAN, instance normalization is already the default (`--norm instance`). For pix2pix (which defaults to `--norm batch`), switch to `--norm syncbatch` or `--norm instance` when using DDP.

The Insight (Rule of Thumb)

  • Action: Set `--norm instance` or `--norm syncbatch` for multi-GPU training.
  • Value: `--norm instance` (CycleGAN default), `--norm syncbatch` (synchronized alternative).
  • Trade-off: Instance normalization ignores inter-image statistics which may matter for some tasks. SyncBatchNorm adds communication overhead between GPUs but preserves batch statistics fidelity.
  • Incompatibility: `--norm batch` with DDP produces incorrect results due to non-shared statistics.

Reasoning

In DDP, each process runs on its own GPU with a portion of the batch. Standard BatchNorm computes statistics only from the local portion, meaning a batch_size of 4 split across 4 GPUs gives each GPU a batch of 1 for normalization, which is extremely noisy. SyncBatchNorm solves this by communicating statistics across GPUs, while InstanceNorm sidesteps the issue entirely by normalizing per-instance.

From `README.md`:

"To train a model on multiple GPUs, please use torchrun --nproc_per_node=4 train.py ... We also need to use synchronized batchnorm by setting --norm sync_batch (or --norm sync_instance for instance normalization). The --norm batch is not compatible with DDP."

From `docs/qa.md`:

"We also recommend that you use the instance normalization for multi-GPU training by setting --norm instance. The current batch normalization might not work for multi-GPUs as the batchnorm parameters are not shared across different GPUs."

Code evidence for SyncBatchNorm support from `models/networks.py:29-30`:

elif norm_type == "syncbatch":
    norm_layer = functools.partial(
        nn.SyncBatchNorm, affine=True, track_running_stats=True
    )

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment