Principle:LaurentMazare Tch rs Group Normalization

Knowledge Sources	LaurentMazare_Tch_rs Wu & He, 2018
Domains	Deep Learning, Normalization, Computer Vision
Last Updated	2026-02-08 00:00 GMT

Overview

Group normalization divides feature channels into groups and normalizes within each group independently, providing stable training behavior regardless of batch size.

Description

Group normalization (GN) is a normalization technique that partitions the channels of a feature map into groups and computes normalization statistics (mean and variance) within each group independently for each sample. Unlike batch normalization which computes statistics across the batch dimension, group normalization operates entirely within a single sample, making it independent of batch size.

The key insight is that feature channels in neural networks often form natural groupings. For example, in convolutional networks, different filter groups may respond to different visual features (edges, textures, colors). By normalizing within these groups, GN preserves the relative differences between groups while standardizing the distribution within each group.

Given a feature map with $C$ channels, GN divides them into $G$ groups of $C / G$ channels each. The mean and variance are computed over the spatial dimensions and the channels within each group. After normalization, learnable scale ( $γ$ ) and shift ( $β$ ) parameters (per channel) allow the network to recover expressive power.

Group normalization is particularly effective when:

Small batch sizes are required due to memory constraints (e.g., high-resolution images, 3D volumes)
Batch statistics are unreliable because the batch is too small to estimate population statistics
The task involves detection or segmentation where large input sizes limit batch size

Usage

Apply group normalization when:

Training with small batch sizes where batch normalization degrades
Building detection or segmentation models with memory-intensive inputs
Needing normalization that is consistent between training and inference (no running statistics)
Working with tasks where batch composition varies (e.g., variable-length sequences)

Theoretical Basis

Normalization Computation

For input features $x$ with shape $(N, C, H, W)$ , divide $C$ channels into $G$ groups. For each sample $n$ and group $g$ :

$μ_{n, g} = \frac{1}{| S_{g} |} \sum_{(c, h, w) \in S_{g}} x_{n, c, h, w}$

$σ_{n, g}^{2} = \frac{1}{| S_{g} |} \sum_{(c, h, w) \in S_{g}} (x_{n, c, h, w} - μ_{n, g})^{2}$

where $S_{g} = {(c, h, w) : ⌊ c / (C / G) ⌋ = g}$ is the set of indices belonging to group $g$ , and $| S_{g} | = (C / G) \times H \times W$ .

Affine Transform

After normalization, per-channel learnable parameters restore representational capacity:

${\hat{x}}_{n, c, h, w} = γ_{c} \cdot \frac{x_{n, c, h, w} - μ_{n, g (c)}}{\sqrt{σ_{n, g (c)}^{2} + ϵ}} + β_{c}$

where $g (c) = ⌊ c / (C / G) ⌋$ maps channel $c$ to its group.

Relationship to Other Normalizations

Group normalization unifies several normalization schemes as special cases:

When $G = C$ (each channel is its own group), GN becomes instance normalization
When $G = 1$ (all channels in one group), GN becomes layer normalization
Batch normalization differs fundamentally by computing statistics across the batch dimension

Related Pages

Implementation:LaurentMazare_Tch_rs_Group_Norm

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment