Principle:Zai org CogVideo Activation Normalization

Knowledge Sources	Glow: Generative Flow with Invertible 1x1 Convolutions
Domains	Normalization, Normalizing_Flows
Last Updated	2026-02-10 00:00 GMT

Overview

Activation normalization (ActNorm) is an invertible per-channel affine transformation layer with data-dependent initialization that normalizes activations without dependence on mini-batch statistics at inference time.

Description

ActNorm was introduced as a replacement for Batch Normalization in flow-based generative models, where the requirement for invertibility precludes the use of operations that depend on batch statistics. Unlike BatchNorm, which normalizes using running mean and variance computed across the batch dimension, ActNorm learns fixed per-channel scale and bias parameters that are initialized once from the first training mini-batch and then optimized as regular parameters.

The initialization procedure ensures that, after the first forward pass, activations have approximately zero mean and unit variance per channel. This data-dependent initialization addresses the training instability that can occur with randomly initialized scale and bias, particularly in deep networks where poor initialization can lead to vanishing or exploding activations.

Key properties that distinguish ActNorm from BatchNorm:

No batch dependence at inference: The transformation is identical regardless of batch size, making it suitable for single-sample inference.
Invertibility: The transformation $y = s ⊙ (x + b)$ can be exactly inverted as $x = y / s - b$ , which is essential for normalizing flows.
Tractable Jacobian: The log-determinant of the Jacobian is analytically computable, enabling exact likelihood evaluation in flow models.

Usage

Use ActNorm in any architecture that requires:

Normalization without batch-size dependence (e.g., single-sample inference)
Invertible normalization for normalizing flow models
A drop-in replacement for BatchNorm in discriminator networks when batch statistics are unreliable (e.g., with very small batches)

Theoretical Basis

The ActNorm layer implements a per-channel affine transformation. Given input $x \in ℝ^{B \times C \times H \times W}$ , the forward pass computes:

$y_{c} = s_{c} \cdot (x_{c} + b_{c})$

where $s_{c}$ and $b_{c}$ are learnable per-channel scale and bias parameters (broadcast over spatial dimensions).

Data-dependent initialization: Given the first mini-batch $x^{(0)}$ , the parameters are set as:

$b_{c} = - μ_{c} (x^{(0)}), s_{c} = \frac{1}{σ_{c} (x^{(0)}) + ϵ}$

where $μ_{c}$ and $σ_{c}$ are the mean and standard deviation of channel $c$ computed over all batch and spatial dimensions.

Log-determinant of the Jacobian: For flow-based models, the change-of-variables formula requires the log-determinant:

$\log | d e t (\frac{\partial y}{\partial x}) | = H \cdot W \cdot \sum_{c = 1}^{C} \log | s_{c} |$

This follows because the Jacobian is a diagonal matrix with entries $s_{c}$ repeated $H \times W$ times per channel. The log-determinant is thus $O (C)$ to compute, which is essential for efficient training of normalizing flows.

Inverse transformation: The reverse pass computes:

$x_{c} = \frac{y_{c}}{s_{c}} - b_{c}$

This exact invertibility, combined with the tractable Jacobian, makes ActNorm a fundamental building block in architectures like Glow and other invertible neural networks.

Related Pages

Implementation:Zai_org_CogVideo_ActNorm

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment