Implementation:AUTOMATIC1111 Stable diffusion webui VectorQuantizer2
| Knowledge Sources | |
|---|---|
| Domains | Vector_Quantization, VQ_VAE, LDSR |
| Last Updated | 2025-05-15 00:00 GMT |
Overview
Implements an improved vector quantizer module for VQ-VAE models, vendored from the CompVis taming-transformers repository.
Description
VectorQuantizer2 is a PyTorch nn.Module that performs vector quantization by maintaining a learned codebook of embedding vectors. Given a continuous latent tensor, it finds the nearest codebook entry for each spatial position and returns the quantized representation along with a commitment loss. The implementation avoids costly matrix multiplications by using efficient distance computation via the identity (z - e)^2 = z^2 + e^2 - 2*e*z. It supports post-hoc index remapping to a subset of used codebook entries, configurable beta for the commitment loss term, and a legacy mode for backward compatibility with an earlier bug in the loss computation. The module also provides methods for converting between indices and quantized latent vectors (get_codebook_entry).
Usage
This module is used internally by the VQModel autoencoder in the LDSR extension. It is instantiated as the quantization layer within VQModel and is not typically used directly by end users. It provides the core vector quantization logic needed for VQ-VAE based image generation and super-resolution pipelines.
Code Reference
Source Location
- Repository: AUTOMATIC1111_Stable_diffusion_webui
- File: extensions-builtin/LDSR/vqvae_quantize.py
- Lines: 1-147
Signature
class VectorQuantizer2(nn.Module):
def __init__(self, n_e, e_dim, beta, remap=None, unknown_index="random",
sane_index_shape=False, legacy=True):
def remap_to_used(self, inds):
def unmap_to_all(self, inds):
def forward(self, z, temp=None, rescale_logits=False, return_logits=False):
def get_codebook_entry(self, indices, shape):
Import
from vqvae_quantize import VectorQuantizer2
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| n_e | int | Yes | Number of embedding vectors in the codebook |
| e_dim | int | Yes | Dimensionality of each embedding vector |
| beta | float | Yes | Commitment loss weight factor |
| remap | str | No | Path to a numpy file containing used codebook indices for remapping |
| unknown_index | str or int | No | Strategy for handling unknown indices during remapping ("random", "extra", or integer) |
| sane_index_shape | bool | No | If True, returns indices shaped as (batch, height, width) instead of flattened |
| legacy | bool | No | If True, uses the original (buggy) beta term ordering for backward compatibility |
| z | torch.Tensor | Yes | Input latent tensor of shape (batch, channels, height, width) for forward pass |
Outputs
| Name | Type | Description |
|---|---|---|
| z_q | torch.Tensor | Quantized latent tensor of shape (batch, channels, height, width) |
| loss | torch.Tensor | Commitment loss scalar combining encoder and codebook losses |
| info | tuple | Tuple of (perplexity, min_encodings, min_encoding_indices) |
Usage Examples
from vqvae_quantize import VectorQuantizer2
# Create a quantizer with 8192 codebook entries of dimension 4
quantizer = VectorQuantizer2(n_e=8192, e_dim=4, beta=0.25)
# Quantize a latent tensor (batch=1, channels=4, height=32, width=32)
z_q, loss, (perplexity, min_encodings, min_encoding_indices) = quantizer(z)
# Retrieve codebook entries from indices
z_q = quantizer.get_codebook_entry(indices, shape=(1, 32, 32, 4))