Implementation:AUTOMATIC1111 Stable diffusion webui VQModel Autoencoder

Knowledge Sources	AUTOMATIC1111_Stable_diffusion_webui
Domains	Autoencoder, VQ_VAE, LDSR
Last Updated	2025-05-15 00:00 GMT

Overview

Provides VQModel and VQModelInterface classes that are hijacked back into the ldm.models.autoencoder module to support the LDSR upscaler extension.

Description

This module re-introduces the VQModel and VQModelInterface classes that were originally present in the CompVis stable-diffusion repository but were removed when the codebase migrated to the stability-ai/stablediffusion repository. The VQModel class is a PyTorch Lightning module implementing a Vector Quantized Variational Autoencoder (VQ-VAE) with an encoder, decoder, and vector quantizer. It supports EMA (Exponential Moving Average) weights, checkpoint loading, configurable loss functions with a discriminator, batch resizing, and learning rate scheduling. The VQModelInterface subclass simplifies the encode/decode interface by deferring quantization to the decode step, optionally allowing forced skip of quantization. At module load time, both classes are monkey-patched into ldm.models.autoencoder to restore compatibility with the LDSR upscaler.

Usage

This module is used internally by the LDSR (Latent Diffusion Super Resolution) built-in extension. It is loaded automatically when the LDSR upscaler is invoked, ensuring that the required VQModel and VQModelInterface classes are available in the ldm.models.autoencoder namespace. End users do not interact with this module directly; it is a compatibility shim for the LDSR pipeline.

Code Reference

Source Location

Repository: AUTOMATIC1111_Stable_diffusion_webui
File: extensions-builtin/LDSR/sd_hijack_autoencoder.py
Lines: 1-293

Signature

class VQModel(pl.LightningModule):
    def __init__(self, ddconfig, lossconfig, n_embed, embed_dim, ckpt_path=None,
                 ignore_keys=None, image_key="image", colorize_nlabels=None,
                 monitor=None, batch_resize_range=None, scheduler_config=None,
                 lr_g_factor=1.0, remap=None, sane_index_shape=False, use_ema=False):

    def encode(self, x):
    def encode_to_prequant(self, x):
    def decode(self, quant):
    def decode_code(self, code_b):
    def forward(self, input, return_pred_indices=False):

class VQModelInterface(VQModel):
    def __init__(self, embed_dim, *args, **kwargs):
    def encode(self, x):
    def decode(self, h, force_not_quantize=False):

Import

from ldm.models.autoencoder import VQModel, VQModelInterface

I/O Contract

Inputs

Name	Type	Required	Description
ddconfig	dict	Yes	Configuration dictionary for the encoder and decoder architecture
lossconfig	dict	Yes	Configuration for the loss function, instantiated via instantiate_from_config
n_embed	int	Yes	Number of embedding vectors in the codebook
embed_dim	int	Yes	Dimensionality of each embedding vector
ckpt_path	str	No	Path to a checkpoint file for weight initialization
ignore_keys	list	No	List of key prefixes to ignore when loading checkpoint state dict
image_key	str	No	Key to extract images from batch dict, defaults to "image"
use_ema	bool	No	Whether to use Exponential Moving Average weights

Outputs

Name	Type	Description
dec	torch.Tensor	Decoded/reconstructed image tensor
diff	torch.Tensor	Quantization embedding loss
ind	torch.Tensor	Predicted codebook indices (when return_pred_indices=True)

Usage Examples

# The module auto-hijacks on import; VQModel becomes available at:
import ldm.models.autoencoder
model = ldm.models.autoencoder.VQModel(
    ddconfig=dd_config,
    lossconfig=loss_config,
    n_embed=8192,
    embed_dim=4
)

# Encode and decode an image
quant, emb_loss, info = model.encode(image_tensor)
reconstructed = model.decode(quant)

# Using VQModelInterface (deferred quantization)
interface = ldm.models.autoencoder.VQModelInterface(embed_dim=4, ddconfig=dd_config, lossconfig=loss_config, n_embed=8192)
h = interface.encode(image_tensor)
decoded = interface.decode(h, force_not_quantize=False)

Related Pages

Principle:AUTOMATIC1111_Stable_diffusion_webui_LDSR_Upscaler

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment