Implementation:Zai org CogVideo ActNorm
| Knowledge Sources | |
|---|---|
| Domains | Video_Generation, Normalization |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Implements ActNorm (Activation Normalization), a data-dependent normalization layer that initializes its affine parameters from the first mini-batch statistics and supports both forward and reverse transformation modes.
Description
This module provides the ActNorm class along with checkpoint download and verification utilities:
ActNorm-- A normalization layer that performs a per-channel affine transformationy = scale * (x + loc). Unlike BatchNorm, ActNorm uses data-dependent initialization: on the first forward pass, the location (loc) and scale parameters are set so that the output has zero mean and unit variance per channel. After initialization, the parameters are treated as regular learnable parameters. Key features include:- Data-dependent initialization: Computes channel-wise mean and standard deviation from the first mini-batch and sets
loc = -mean,scale = 1/std. - Reverse mode: Supports invertible computation via
reverse=True, computingx = y / scale - loc. - Log-determinant: Optionally computes
log|det(dy/dx)| = H * W * sum(log|scale|)for use in normalizing flow models. - 2D input support: Handles both 4D
[B, C, H, W]and 2D[B, C]inputs (the latter are temporarily reshaped to 4D).
- Data-dependent initialization: Computes channel-wise mean and standard deviation from the first mini-batch and sets
get_ckpt_path()-- Resolves a model checkpoint name (e.g., "vgg_lpips") to a local file path, downloading from a remote URL with MD5 verification if the file is missing.download()-- Streams a file from a URL with a progress bar.md5_hash()-- Computes the MD5 hash of a file for integrity verification.
Usage
Used as an alternative to nn.BatchNorm2d in the NLayerDiscriminator when use_actnorm=True. Also useful in normalizing flow architectures where invertible normalization with known log-determinant is required. The checkpoint utilities support the LPIPS module by ensuring pretrained VGG weights are available locally.
Code Reference
Source Location
- Repository: Zai_org_CogVideo
- File: sat/sgm/modules/autoencoding/lpips/util.py
Signature
class ActNorm(nn.Module):
def __init__(
self,
num_features,
logdet=False,
affine=True,
allow_reverse_init=False,
)
def initialize(self, input)
def forward(self, input, reverse=False) -> Union[torch.Tensor, tuple[torch.Tensor, torch.Tensor]]
def reverse(self, output) -> torch.Tensor
def get_ckpt_path(name, root, check=False) -> str
def download(url, local_path, chunk_size=1024)
def md5_hash(path) -> str
Import
from sat.sgm.modules.autoencoding.lpips.util import ActNorm, get_ckpt_path
I/O Contract
Inputs (ActNorm.forward)
| Name | Type | Required | Description |
|---|---|---|---|
| input | torch.Tensor |
Yes | Input tensor of shape [B, C, H, W] or [B, C]
|
| reverse | bool |
No | If True, apply the inverse transformation. Default: False |
Constructor Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| num_features | int |
Yes | Number of channels (features) for the normalization |
| logdet | bool |
No | Whether to return the log-determinant of the Jacobian. Default: False |
| affine | bool |
No | Must be True (asserted). Default: True |
| allow_reverse_init | bool |
No | Allow initialization from reverse pass. Default: False |
Outputs (ActNorm.forward)
| Name | Type | Description |
|---|---|---|
| h | torch.Tensor |
Normalized output, same shape as input |
| logdet | torch.Tensor |
(Only if logdet=True) Log-determinant of shape [B]
|
Usage Examples
from sat.sgm.modules.autoencoding.lpips.util import ActNorm
# Create an ActNorm layer for 64-channel features
norm = ActNorm(num_features=64, logdet=False)
# First forward pass triggers data-dependent initialization
output = norm(feature_map) # feature_map: [B, 64, H, W]
# With log-determinant for flow models
norm_flow = ActNorm(num_features=64, logdet=True)
output, log_det = norm_flow(feature_map)
# Reverse transformation (invertible)
reconstructed = norm_flow(output, reverse=True)