Implementation:Zai org CogVideo LookupFreeQuantization
| Knowledge Sources | |
|---|---|
| Domains | Video_Generation, Autoencoding, Quantization |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
LFQ (Lookup-Free Quantization) is a discrete quantization module that maps continuous latent features to binary codes by thresholding at zero, using straight-through gradients and entropy-based auxiliary losses to encourage full codebook utilization without maintaining an explicit codebook embedding table.
Description
The LFQ class implements the quantization method proposed in "Language Model Beats Diffusion" (arXiv:2310.05737). Unlike traditional vector quantization (VQ) which maintains a learned codebook of embedding vectors and finds nearest neighbors, LFQ quantizes each dimension independently to binary values {-codebook_scale, +codebook_scale} by simply thresholding at zero.
Quantization Process:
- Input features of dimension dim are optionally projected to codebook_dims (= codebook_dim * num_codebooks) via a learned linear projection if the dimensions do not match.
- Features are reshaped into
(B, N, C, D)where C is the number of codebooks and D islog2(codebook_size). - Each value is quantized: positive values become
+codebook_scale, negative values become-codebook_scale. - During training, straight-through gradients are used: the forward pass uses discrete quantized values, but gradients flow through an optional activation function applied to the original input.
- Discrete indices are computed by treating the binary decisions as bits and packing them into integer codes via a binary mask.
Auxiliary Losses:
- Per-sample entropy loss: Soft distances to all possible codes are computed and converted to probabilities via softmax with inv_temperature. The entropy of these per-sample distributions is minimized to encourage confident, low-entropy code assignments.
- Batch codebook entropy: The average probability distribution across the batch is computed, and its entropy is maximized (weighted by diversity_gamma) to encourage uniform usage of all codes.
- Commitment loss: MSE between the original continuous input and the quantized output, weighted by commitment_loss_weight.
The module supports multiple codebooks via num_codebooks, handles both image (B, D, H, W) and video (B, D, T, H, W) tensors automatically, and can optionally subsample tokens for entropy computation to reduce memory usage via frac_per_sample_entropy.
Usage
Use this module as the quantization bottleneck in a video or image autoencoder when you want to avoid the complexity and potential codebook collapse issues of traditional vector quantization. It is the default quantizer used by VideoTokenizer in the MagViT2 architecture.
Code Reference
Source Location
- Repository: Zai_org_CogVideo
- File: sat/sgm/modules/autoencoding/regularizers/lookup_free_quantization.py
- Lines: 62-315
Signature
class LFQ(Module):
def __init__(
self,
*,
dim=None,
codebook_size=None,
entropy_loss_weight=0.1,
commitment_loss_weight=0.25,
diversity_gamma=1.0,
straight_through_activation=nn.Identity(),
num_codebooks=1,
keep_num_codebooks_dim=None,
codebook_scale=1.0,
frac_per_sample_entropy=1.0,
):
Import
from sat.sgm.modules.autoencoding.regularizers.lookup_free_quantization import LFQ
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| x | torch.Tensor | Yes | Continuous input features, shape (B, D, ...) for images/video or (B, N, D) for sequences; dimension D must match dim
|
| inv_temperature | float | No | Inverse temperature for the softmax over code distances; defaults to 100.0. Higher values produce sharper distributions |
| return_loss_breakdown | bool | No | If True, returns a LossBreakdown named tuple with per_sample_entropy, batch_entropy, and commitment loss; defaults to False |
| mask | Optional[torch.Tensor] | No | Boolean mask for selecting which tokens to include in entropy and commitment loss computation |
Outputs
| Name | Type | Description |
|---|---|---|
| quantized | torch.Tensor | Quantized output tensor, same shape as input. Binary values scaled by codebook_scale, projected back to original dimension |
| indices | torch.Tensor | Integer codebook indices, shape depends on input dimensionality and keep_num_codebooks_dim setting |
| entropy_aux_loss | torch.Tensor | Scalar auxiliary loss combining weighted per-sample entropy, codebook entropy, and commitment loss |
The return type is a named tuple Return(quantized, indices, entropy_aux_loss). When return_loss_breakdown=True, the method returns a tuple of (Return, LossBreakdown) where LossBreakdown(per_sample_entropy, batch_entropy, commitment).
Usage Examples
# Initialize LFQ quantizer
quantizer = LFQ(
dim=256,
codebook_size=1024, # Must be a power of 2; log2(1024) = 10 bits per code
num_codebooks=1,
entropy_loss_weight=0.1,
commitment_loss_weight=1.0,
diversity_gamma=2.5,
)
# Quantize encoder output (video features)
encoder_output = torch.randn(4, 256, 8, 16, 16) # (B, D, T, H, W)
(quantized, indices, aux_loss), loss_breakdown = quantizer(
encoder_output, return_loss_breakdown=True
)
# quantized: (4, 256, 8, 16, 16)
# indices: (4, 8, 16, 16) or (4, 8, 16, 16, 1) depending on keep_num_codebooks_dim
# Decode indices back to continuous representation
reconstructed = quantizer.indices_to_codes(indices)