Implementation:AUTOMATIC1111 Stable diffusion webui DDPM V1 Diffusion Model
| Knowledge Sources | |
|---|---|
| Domains | Diffusion Models, Image Generation, LDSR |
| Last Updated | 2025-05-15 00:00 GMT |
Overview
Reinstates the original Stable Diffusion V1 DDPM and Latent Diffusion model classes that are compatible with VQ-based first stages, which the V2 DDPM implementation dropped, enabling LDSR upscaling to function correctly.
Description
This module is copied from the compvis/stable-diffusion repository (the SD V1 repo) and provides full implementations of four classes suffixed with "V1": DDPMV1 implements classic Gaussian diffusion in image/latent space with training, sampling, EMA (Exponential Moving Average) support, and noise schedule registration. LatentDiffusionV1 extends DDPMV1 with first-stage encoding/decoding via VQ or KL autoencoders, conditioning stage integration (CLIP, class labels, etc.), and patch-based split-input processing for large images. DiffusionWrapperV1 routes conditioning through concat, cross-attention, hybrid, or ADM modes to the U-Net. Layout2ImgDiffusionV1 provides layout-to-image generation capabilities. All classes are monkey-patched into the ldm.models.diffusion.ddpm module at load time so they are discoverable by the model loading infrastructure.
Usage
This code is used internally by the LDSR (Latent Diffusion Super Resolution) extension. It is loaded automatically when LDSR upscaling is invoked. Users do not need to interact with it directly; it is a compatibility layer ensuring that VQ-quantized models work with the current codebase.
Code Reference
Source Location
- Repository: AUTOMATIC1111_Stable_diffusion_webui
- File: extensions-builtin/LDSR/sd_hijack_ddpm_v1.py
- Lines: 1-1443
Signature
class DDPMV1(pl.LightningModule):
def __init__(self, unet_config, timesteps=1000, beta_schedule="linear",
loss_type="l2", ckpt_path=None, ignore_keys=None,
load_only_unet=False, monitor="val/loss", use_ema=True,
first_stage_key="image", image_size=256, channels=3,
log_every_t=100, clip_denoised=True, linear_start=1e-4,
linear_end=2e-2, cosine_s=8e-3, given_betas=None,
original_elbo_weight=0., v_posterior=0.,
l_simple_weight=1., conditioning_key=None,
parameterization="eps", scheduler_config=None,
use_positional_encodings=False, learn_logvar=False,
logvar_init=0.): ...
class LatentDiffusionV1(DDPMV1):
def __init__(self, first_stage_config, cond_stage_config,
num_timesteps_cond=None, cond_stage_key="image",
cond_stage_trainable=False, concat_mode=True,
cond_stage_forward=None, conditioning_key=None,
scale_factor=1.0, scale_by_std=False, *args, **kwargs): ...
class DiffusionWrapperV1(pl.LightningModule):
def __init__(self, diff_model_config, conditioning_key): ...
class Layout2ImgDiffusionV1(LatentDiffusionV1):
def __init__(self, cond_stage_key='coordinates_bbox', **kwargs): ...
Import
import ldm.models.diffusion.ddpm
# Classes are monkey-patched into this module:
# ldm.models.diffusion.ddpm.DDPMV1
# ldm.models.diffusion.ddpm.LatentDiffusionV1
# ldm.models.diffusion.ddpm.DiffusionWrapperV1
# ldm.models.diffusion.ddpm.Layout2ImgDiffusionV1
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| unet_config | dict | Yes | Configuration dictionary for instantiating the U-Net backbone model |
| timesteps | int | No | Number of diffusion timesteps (default: 1000) |
| beta_schedule | str | No | Type of noise schedule: "linear", "cosine", etc. (default: "linear") |
| loss_type | str | No | Loss function type: "l1" or "l2" (default: "l2") |
| parameterization | str | No | Prediction target: "eps" (noise) or "x0" (clean image) (default: "eps") |
| first_stage_config | dict | Yes (LatentDiffusionV1) | Configuration for the first-stage autoencoder (VQ-VAE or KL-VAE) |
| cond_stage_config | dict | Yes (LatentDiffusionV1) | Configuration for the conditioning stage model (e.g., CLIP) |
| scale_factor | float | No | Scaling factor for latent representations (default: 1.0) |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | torch.Tensor | Generated image tensors from the diffusion sampling process |
| loss | torch.Tensor | Training loss value combining simple loss and VLB terms |
| loss_dict | dict | Dictionary of individual loss components for logging |
Usage Examples
# This module is used internally by the LDSR extension.
# The classes are automatically patched into ldm.models.diffusion.ddpm
# and instantiated via config files. Typical usage:
from ldm.models.diffusion.ddpm import LatentDiffusionV1
# Model is instantiated from a config file by instantiate_from_config()
# during LDSR model loading. Direct instantiation example:
model = LatentDiffusionV1(
first_stage_config=vq_config,
cond_stage_config=cond_config,
unet_config=unet_config,
timesteps=1000,
image_size=64,
channels=3,
conditioning_key="crossattn"
)
# Sampling (inference):
with model.ema_scope():
samples = model.sample(batch_size=1)