Principle:LaurentMazare Tch rs Group Normalization
| Knowledge Sources | |
|---|---|
| Domains | Deep Learning, Normalization, Computer Vision |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Group normalization divides feature channels into groups and normalizes within each group independently, providing stable training behavior regardless of batch size.
Description
Group normalization (GN) is a normalization technique that partitions the channels of a feature map into groups and computes normalization statistics (mean and variance) within each group independently for each sample. Unlike batch normalization which computes statistics across the batch dimension, group normalization operates entirely within a single sample, making it independent of batch size.
The key insight is that feature channels in neural networks often form natural groupings. For example, in convolutional networks, different filter groups may respond to different visual features (edges, textures, colors). By normalizing within these groups, GN preserves the relative differences between groups while standardizing the distribution within each group.
Given a feature map with channels, GN divides them into groups of channels each. The mean and variance are computed over the spatial dimensions and the channels within each group. After normalization, learnable scale () and shift () parameters (per channel) allow the network to recover expressive power.
Group normalization is particularly effective when:
- Small batch sizes are required due to memory constraints (e.g., high-resolution images, 3D volumes)
- Batch statistics are unreliable because the batch is too small to estimate population statistics
- The task involves detection or segmentation where large input sizes limit batch size
Usage
Apply group normalization when:
- Training with small batch sizes where batch normalization degrades
- Building detection or segmentation models with memory-intensive inputs
- Needing normalization that is consistent between training and inference (no running statistics)
- Working with tasks where batch composition varies (e.g., variable-length sequences)
Theoretical Basis
Normalization Computation
For input features with shape , divide channels into groups. For each sample and group :
where is the set of indices belonging to group , and .
Affine Transform
After normalization, per-channel learnable parameters restore representational capacity:
where maps channel to its group.
Relationship to Other Normalizations
Group normalization unifies several normalization schemes as special cases:
- When (each channel is its own group), GN becomes instance normalization
- When (all channels in one group), GN becomes layer normalization
- Batch normalization differs fundamentally by computing statistics across the batch dimension