Implementation:AUTOMATIC1111 Stable diffusion webui DDPM V1 Diffusion Model

Knowledge Sources	AUTOMATIC1111_Stable_diffusion_webui
Domains	Diffusion Models, Image Generation, LDSR
Last Updated	2025-05-15 00:00 GMT

Overview

Reinstates the original Stable Diffusion V1 DDPM and Latent Diffusion model classes that are compatible with VQ-based first stages, which the V2 DDPM implementation dropped, enabling LDSR upscaling to function correctly.

Description

This module is copied from the compvis/stable-diffusion repository (the SD V1 repo) and provides full implementations of four classes suffixed with "V1": DDPMV1 implements classic Gaussian diffusion in image/latent space with training, sampling, EMA (Exponential Moving Average) support, and noise schedule registration. LatentDiffusionV1 extends DDPMV1 with first-stage encoding/decoding via VQ or KL autoencoders, conditioning stage integration (CLIP, class labels, etc.), and patch-based split-input processing for large images. DiffusionWrapperV1 routes conditioning through concat, cross-attention, hybrid, or ADM modes to the U-Net. Layout2ImgDiffusionV1 provides layout-to-image generation capabilities. All classes are monkey-patched into the ldm.models.diffusion.ddpm module at load time so they are discoverable by the model loading infrastructure.

Usage

This code is used internally by the LDSR (Latent Diffusion Super Resolution) extension. It is loaded automatically when LDSR upscaling is invoked. Users do not need to interact with it directly; it is a compatibility layer ensuring that VQ-quantized models work with the current codebase.

Code Reference

Source Location

Repository: AUTOMATIC1111_Stable_diffusion_webui
File: extensions-builtin/LDSR/sd_hijack_ddpm_v1.py
Lines: 1-1443

Signature

class DDPMV1(pl.LightningModule):
    def __init__(self, unet_config, timesteps=1000, beta_schedule="linear",
                 loss_type="l2", ckpt_path=None, ignore_keys=None,
                 load_only_unet=False, monitor="val/loss", use_ema=True,
                 first_stage_key="image", image_size=256, channels=3,
                 log_every_t=100, clip_denoised=True, linear_start=1e-4,
                 linear_end=2e-2, cosine_s=8e-3, given_betas=None,
                 original_elbo_weight=0., v_posterior=0.,
                 l_simple_weight=1., conditioning_key=None,
                 parameterization="eps", scheduler_config=None,
                 use_positional_encodings=False, learn_logvar=False,
                 logvar_init=0.): ...

class LatentDiffusionV1(DDPMV1):
    def __init__(self, first_stage_config, cond_stage_config,
                 num_timesteps_cond=None, cond_stage_key="image",
                 cond_stage_trainable=False, concat_mode=True,
                 cond_stage_forward=None, conditioning_key=None,
                 scale_factor=1.0, scale_by_std=False, *args, **kwargs): ...

class DiffusionWrapperV1(pl.LightningModule):
    def __init__(self, diff_model_config, conditioning_key): ...

class Layout2ImgDiffusionV1(LatentDiffusionV1):
    def __init__(self, cond_stage_key='coordinates_bbox', **kwargs): ...

Import

import ldm.models.diffusion.ddpm
# Classes are monkey-patched into this module:
# ldm.models.diffusion.ddpm.DDPMV1
# ldm.models.diffusion.ddpm.LatentDiffusionV1
# ldm.models.diffusion.ddpm.DiffusionWrapperV1
# ldm.models.diffusion.ddpm.Layout2ImgDiffusionV1

I/O Contract

Inputs

Name	Type	Required	Description
unet_config	dict	Yes	Configuration dictionary for instantiating the U-Net backbone model
timesteps	int	No	Number of diffusion timesteps (default: 1000)
beta_schedule	str	No	Type of noise schedule: "linear", "cosine", etc. (default: "linear")
loss_type	str	No	Loss function type: "l1" or "l2" (default: "l2")
parameterization	str	No	Prediction target: "eps" (noise) or "x0" (clean image) (default: "eps")
first_stage_config	dict	Yes (LatentDiffusionV1)	Configuration for the first-stage autoencoder (VQ-VAE or KL-VAE)
cond_stage_config	dict	Yes (LatentDiffusionV1)	Configuration for the conditioning stage model (e.g., CLIP)
scale_factor	float	No	Scaling factor for latent representations (default: 1.0)

Outputs

Name	Type	Description
samples	torch.Tensor	Generated image tensors from the diffusion sampling process
loss	torch.Tensor	Training loss value combining simple loss and VLB terms
loss_dict	dict	Dictionary of individual loss components for logging

Usage Examples

# This module is used internally by the LDSR extension.
# The classes are automatically patched into ldm.models.diffusion.ddpm
# and instantiated via config files. Typical usage:

from ldm.models.diffusion.ddpm import LatentDiffusionV1

# Model is instantiated from a config file by instantiate_from_config()
# during LDSR model loading. Direct instantiation example:
model = LatentDiffusionV1(
    first_stage_config=vq_config,
    cond_stage_config=cond_config,
    unet_config=unet_config,
    timesteps=1000,
    image_size=64,
    channels=3,
    conditioning_key="crossattn"
)

# Sampling (inference):
with model.ema_scope():
    samples = model.sample(batch_size=1)

Related Pages

Principle:AUTOMATIC1111_Stable_diffusion_webui_Diffusion_Model_Architecture

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment