Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:LaurentMazare Tch rs Group Normalization

From Leeroopedia


Knowledge Sources
Domains Deep Learning, Normalization, Computer Vision
Last Updated 2026-02-08 00:00 GMT

Overview

Group normalization divides feature channels into groups and normalizes within each group independently, providing stable training behavior regardless of batch size.

Description

Group normalization (GN) is a normalization technique that partitions the channels of a feature map into groups and computes normalization statistics (mean and variance) within each group independently for each sample. Unlike batch normalization which computes statistics across the batch dimension, group normalization operates entirely within a single sample, making it independent of batch size.

The key insight is that feature channels in neural networks often form natural groupings. For example, in convolutional networks, different filter groups may respond to different visual features (edges, textures, colors). By normalizing within these groups, GN preserves the relative differences between groups while standardizing the distribution within each group.

Given a feature map with C channels, GN divides them into G groups of C/G channels each. The mean and variance are computed over the spatial dimensions and the channels within each group. After normalization, learnable scale (γ) and shift (β) parameters (per channel) allow the network to recover expressive power.

Group normalization is particularly effective when:

  • Small batch sizes are required due to memory constraints (e.g., high-resolution images, 3D volumes)
  • Batch statistics are unreliable because the batch is too small to estimate population statistics
  • The task involves detection or segmentation where large input sizes limit batch size

Usage

Apply group normalization when:

  • Training with small batch sizes where batch normalization degrades
  • Building detection or segmentation models with memory-intensive inputs
  • Needing normalization that is consistent between training and inference (no running statistics)
  • Working with tasks where batch composition varies (e.g., variable-length sequences)

Theoretical Basis

Normalization Computation

For input features x with shape (N,C,H,W), divide C channels into G groups. For each sample n and group g:

μn,g=1|Sg|(c,h,w)Sgxn,c,h,w

σn,g2=1|Sg|(c,h,w)Sg(xn,c,h,wμn,g)2

where Sg={(c,h,w):c/(C/G)=g} is the set of indices belonging to group g, and |Sg|=(C/G)×H×W.

Affine Transform

After normalization, per-channel learnable parameters restore representational capacity:

x^n,c,h,w=γcxn,c,h,wμn,g(c)σn,g(c)2+ϵ+βc

where g(c)=c/(C/G) maps channel c to its group.

Relationship to Other Normalizations

Group normalization unifies several normalization schemes as special cases:

  • When G=C (each channel is its own group), GN becomes instance normalization
  • When G=1 (all channels in one group), GN becomes layer normalization
  • Batch normalization differs fundamentally by computing statistics across the batch dimension

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment