Implementation:AUTOMATIC1111 Stable diffusion webui VQModel Autoencoder
| Knowledge Sources | |
|---|---|
| Domains | Autoencoder, VQ_VAE, LDSR |
| Last Updated | 2025-05-15 00:00 GMT |
Overview
Provides VQModel and VQModelInterface classes that are hijacked back into the ldm.models.autoencoder module to support the LDSR upscaler extension.
Description
This module re-introduces the VQModel and VQModelInterface classes that were originally present in the CompVis stable-diffusion repository but were removed when the codebase migrated to the stability-ai/stablediffusion repository. The VQModel class is a PyTorch Lightning module implementing a Vector Quantized Variational Autoencoder (VQ-VAE) with an encoder, decoder, and vector quantizer. It supports EMA (Exponential Moving Average) weights, checkpoint loading, configurable loss functions with a discriminator, batch resizing, and learning rate scheduling. The VQModelInterface subclass simplifies the encode/decode interface by deferring quantization to the decode step, optionally allowing forced skip of quantization. At module load time, both classes are monkey-patched into ldm.models.autoencoder to restore compatibility with the LDSR upscaler.
Usage
This module is used internally by the LDSR (Latent Diffusion Super Resolution) built-in extension. It is loaded automatically when the LDSR upscaler is invoked, ensuring that the required VQModel and VQModelInterface classes are available in the ldm.models.autoencoder namespace. End users do not interact with this module directly; it is a compatibility shim for the LDSR pipeline.
Code Reference
Source Location
- Repository: AUTOMATIC1111_Stable_diffusion_webui
- File: extensions-builtin/LDSR/sd_hijack_autoencoder.py
- Lines: 1-293
Signature
class VQModel(pl.LightningModule):
def __init__(self, ddconfig, lossconfig, n_embed, embed_dim, ckpt_path=None,
ignore_keys=None, image_key="image", colorize_nlabels=None,
monitor=None, batch_resize_range=None, scheduler_config=None,
lr_g_factor=1.0, remap=None, sane_index_shape=False, use_ema=False):
def encode(self, x):
def encode_to_prequant(self, x):
def decode(self, quant):
def decode_code(self, code_b):
def forward(self, input, return_pred_indices=False):
class VQModelInterface(VQModel):
def __init__(self, embed_dim, *args, **kwargs):
def encode(self, x):
def decode(self, h, force_not_quantize=False):
Import
from ldm.models.autoencoder import VQModel, VQModelInterface
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ddconfig | dict | Yes | Configuration dictionary for the encoder and decoder architecture |
| lossconfig | dict | Yes | Configuration for the loss function, instantiated via instantiate_from_config |
| n_embed | int | Yes | Number of embedding vectors in the codebook |
| embed_dim | int | Yes | Dimensionality of each embedding vector |
| ckpt_path | str | No | Path to a checkpoint file for weight initialization |
| ignore_keys | list | No | List of key prefixes to ignore when loading checkpoint state dict |
| image_key | str | No | Key to extract images from batch dict, defaults to "image" |
| use_ema | bool | No | Whether to use Exponential Moving Average weights |
Outputs
| Name | Type | Description |
|---|---|---|
| dec | torch.Tensor | Decoded/reconstructed image tensor |
| diff | torch.Tensor | Quantization embedding loss |
| ind | torch.Tensor | Predicted codebook indices (when return_pred_indices=True) |
Usage Examples
# The module auto-hijacks on import; VQModel becomes available at:
import ldm.models.autoencoder
model = ldm.models.autoencoder.VQModel(
ddconfig=dd_config,
lossconfig=loss_config,
n_embed=8192,
embed_dim=4
)
# Encode and decode an image
quant, emb_loss, info = model.encode(image_tensor)
reconstructed = model.decode(quant)
# Using VQModelInterface (deferred quantization)
interface = ldm.models.autoencoder.VQModelInterface(embed_dim=4, ddconfig=dd_config, lossconfig=loss_config, n_embed=8192)
h = interface.encode(image_tensor)
decoded = interface.decode(h, force_not_quantize=False)