Implementation:Zai org CogVideo LookupFreeQuantization

Knowledge Sources	Zai_org_CogVideo Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Domains	Video_Generation, Autoencoding, Quantization
Last Updated	2026-02-10 00:00 GMT

Overview

LFQ (Lookup-Free Quantization) is a discrete quantization module that maps continuous latent features to binary codes by thresholding at zero, using straight-through gradients and entropy-based auxiliary losses to encourage full codebook utilization without maintaining an explicit codebook embedding table.

Description

The LFQ class implements the quantization method proposed in "Language Model Beats Diffusion" (arXiv:2310.05737). Unlike traditional vector quantization (VQ) which maintains a learned codebook of embedding vectors and finds nearest neighbors, LFQ quantizes each dimension independently to binary values {-codebook_scale, +codebook_scale} by simply thresholding at zero.

Quantization Process:

Input features of dimension dim are optionally projected to codebook_dims (= codebook_dim * num_codebooks) via a learned linear projection if the dimensions do not match.
Features are reshaped into (B, N, C, D) where C is the number of codebooks and D is log2(codebook_size).
Each value is quantized: positive values become +codebook_scale, negative values become -codebook_scale.
During training, straight-through gradients are used: the forward pass uses discrete quantized values, but gradients flow through an optional activation function applied to the original input.
Discrete indices are computed by treating the binary decisions as bits and packing them into integer codes via a binary mask.

Auxiliary Losses:

Per-sample entropy loss: Soft distances to all possible codes are computed and converted to probabilities via softmax with inv_temperature. The entropy of these per-sample distributions is minimized to encourage confident, low-entropy code assignments.
Batch codebook entropy: The average probability distribution across the batch is computed, and its entropy is maximized (weighted by diversity_gamma) to encourage uniform usage of all codes.
Commitment loss: MSE between the original continuous input and the quantized output, weighted by commitment_loss_weight.

The module supports multiple codebooks via num_codebooks, handles both image (B, D, H, W) and video (B, D, T, H, W) tensors automatically, and can optionally subsample tokens for entropy computation to reduce memory usage via frac_per_sample_entropy.

Usage

Use this module as the quantization bottleneck in a video or image autoencoder when you want to avoid the complexity and potential codebook collapse issues of traditional vector quantization. It is the default quantizer used by VideoTokenizer in the MagViT2 architecture.

Code Reference

Source Location

Repository: Zai_org_CogVideo
File: sat/sgm/modules/autoencoding/regularizers/lookup_free_quantization.py
Lines: 62-315

Signature

class LFQ(Module):
    def __init__(
        self,
        *,
        dim=None,
        codebook_size=None,
        entropy_loss_weight=0.1,
        commitment_loss_weight=0.25,
        diversity_gamma=1.0,
        straight_through_activation=nn.Identity(),
        num_codebooks=1,
        keep_num_codebooks_dim=None,
        codebook_scale=1.0,
        frac_per_sample_entropy=1.0,
    ):

Import

from sat.sgm.modules.autoencoding.regularizers.lookup_free_quantization import LFQ

I/O Contract

Inputs

Name	Type	Required	Description
x	torch.Tensor	Yes	Continuous input features, shape `(B, D, ...)` for images/video or `(B, N, D)` for sequences; dimension D must match dim
inv_temperature	float	No	Inverse temperature for the softmax over code distances; defaults to 100.0. Higher values produce sharper distributions
return_loss_breakdown	bool	No	If True, returns a LossBreakdown named tuple with per_sample_entropy, batch_entropy, and commitment loss; defaults to False
mask	Optional[torch.Tensor]	No	Boolean mask for selecting which tokens to include in entropy and commitment loss computation

Outputs

Name	Type	Description
quantized	torch.Tensor	Quantized output tensor, same shape as input. Binary values scaled by codebook_scale, projected back to original dimension
indices	torch.Tensor	Integer codebook indices, shape depends on input dimensionality and keep_num_codebooks_dim setting
entropy_aux_loss	torch.Tensor	Scalar auxiliary loss combining weighted per-sample entropy, codebook entropy, and commitment loss

The return type is a named tuple Return(quantized, indices, entropy_aux_loss). When return_loss_breakdown=True, the method returns a tuple of (Return, LossBreakdown) where LossBreakdown(per_sample_entropy, batch_entropy, commitment).

Usage Examples

# Initialize LFQ quantizer
quantizer = LFQ(
    dim=256,
    codebook_size=1024,       # Must be a power of 2; log2(1024) = 10 bits per code
    num_codebooks=1,
    entropy_loss_weight=0.1,
    commitment_loss_weight=1.0,
    diversity_gamma=2.5,
)

# Quantize encoder output (video features)
encoder_output = torch.randn(4, 256, 8, 16, 16)  # (B, D, T, H, W)
(quantized, indices, aux_loss), loss_breakdown = quantizer(
    encoder_output, return_loss_breakdown=True
)
# quantized: (4, 256, 8, 16, 16)
# indices:   (4, 8, 16, 16) or (4, 8, 16, 16, 1) depending on keep_num_codebooks_dim

# Decode indices back to continuous representation
reconstructed = quantizer.indices_to_codes(indices)

Related Pages

Principle:Zai_org_CogVideo_Lookup_Free_Quantization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment