Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Zai org CogVideo LookupFreeQuantization

From Leeroopedia


Knowledge Sources
Domains Video_Generation, Autoencoding, Quantization
Last Updated 2026-02-10 00:00 GMT

Overview

LFQ (Lookup-Free Quantization) is a discrete quantization module that maps continuous latent features to binary codes by thresholding at zero, using straight-through gradients and entropy-based auxiliary losses to encourage full codebook utilization without maintaining an explicit codebook embedding table.

Description

The LFQ class implements the quantization method proposed in "Language Model Beats Diffusion" (arXiv:2310.05737). Unlike traditional vector quantization (VQ) which maintains a learned codebook of embedding vectors and finds nearest neighbors, LFQ quantizes each dimension independently to binary values {-codebook_scale, +codebook_scale} by simply thresholding at zero.

Quantization Process:

  1. Input features of dimension dim are optionally projected to codebook_dims (= codebook_dim * num_codebooks) via a learned linear projection if the dimensions do not match.
  2. Features are reshaped into (B, N, C, D) where C is the number of codebooks and D is log2(codebook_size).
  3. Each value is quantized: positive values become +codebook_scale, negative values become -codebook_scale.
  4. During training, straight-through gradients are used: the forward pass uses discrete quantized values, but gradients flow through an optional activation function applied to the original input.
  5. Discrete indices are computed by treating the binary decisions as bits and packing them into integer codes via a binary mask.

Auxiliary Losses:

  • Per-sample entropy loss: Soft distances to all possible codes are computed and converted to probabilities via softmax with inv_temperature. The entropy of these per-sample distributions is minimized to encourage confident, low-entropy code assignments.
  • Batch codebook entropy: The average probability distribution across the batch is computed, and its entropy is maximized (weighted by diversity_gamma) to encourage uniform usage of all codes.
  • Commitment loss: MSE between the original continuous input and the quantized output, weighted by commitment_loss_weight.

The module supports multiple codebooks via num_codebooks, handles both image (B, D, H, W) and video (B, D, T, H, W) tensors automatically, and can optionally subsample tokens for entropy computation to reduce memory usage via frac_per_sample_entropy.

Usage

Use this module as the quantization bottleneck in a video or image autoencoder when you want to avoid the complexity and potential codebook collapse issues of traditional vector quantization. It is the default quantizer used by VideoTokenizer in the MagViT2 architecture.

Code Reference

Source Location

  • Repository: Zai_org_CogVideo
  • File: sat/sgm/modules/autoencoding/regularizers/lookup_free_quantization.py
  • Lines: 62-315

Signature

class LFQ(Module):
    def __init__(
        self,
        *,
        dim=None,
        codebook_size=None,
        entropy_loss_weight=0.1,
        commitment_loss_weight=0.25,
        diversity_gamma=1.0,
        straight_through_activation=nn.Identity(),
        num_codebooks=1,
        keep_num_codebooks_dim=None,
        codebook_scale=1.0,
        frac_per_sample_entropy=1.0,
    ):

Import

from sat.sgm.modules.autoencoding.regularizers.lookup_free_quantization import LFQ

I/O Contract

Inputs

Name Type Required Description
x torch.Tensor Yes Continuous input features, shape (B, D, ...) for images/video or (B, N, D) for sequences; dimension D must match dim
inv_temperature float No Inverse temperature for the softmax over code distances; defaults to 100.0. Higher values produce sharper distributions
return_loss_breakdown bool No If True, returns a LossBreakdown named tuple with per_sample_entropy, batch_entropy, and commitment loss; defaults to False
mask Optional[torch.Tensor] No Boolean mask for selecting which tokens to include in entropy and commitment loss computation

Outputs

Name Type Description
quantized torch.Tensor Quantized output tensor, same shape as input. Binary values scaled by codebook_scale, projected back to original dimension
indices torch.Tensor Integer codebook indices, shape depends on input dimensionality and keep_num_codebooks_dim setting
entropy_aux_loss torch.Tensor Scalar auxiliary loss combining weighted per-sample entropy, codebook entropy, and commitment loss

The return type is a named tuple Return(quantized, indices, entropy_aux_loss). When return_loss_breakdown=True, the method returns a tuple of (Return, LossBreakdown) where LossBreakdown(per_sample_entropy, batch_entropy, commitment).

Usage Examples

# Initialize LFQ quantizer
quantizer = LFQ(
    dim=256,
    codebook_size=1024,       # Must be a power of 2; log2(1024) = 10 bits per code
    num_codebooks=1,
    entropy_loss_weight=0.1,
    commitment_loss_weight=1.0,
    diversity_gamma=2.5,
)

# Quantize encoder output (video features)
encoder_output = torch.randn(4, 256, 8, 16, 16)  # (B, D, T, H, W)
(quantized, indices, aux_loss), loss_breakdown = quantizer(
    encoder_output, return_loss_breakdown=True
)
# quantized: (4, 256, 8, 16, 16)
# indices:   (4, 8, 16, 16) or (4, 8, 16, 16, 1) depending on keep_num_codebooks_dim

# Decode indices back to continuous representation
reconstructed = quantizer.indices_to_codes(indices)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment