Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:AUTOMATIC1111 Stable diffusion webui DDPM V1 Diffusion Model

From Leeroopedia


Knowledge Sources
Domains Diffusion Models, Image Generation, LDSR
Last Updated 2025-05-15 00:00 GMT

Overview

Reinstates the original Stable Diffusion V1 DDPM and Latent Diffusion model classes that are compatible with VQ-based first stages, which the V2 DDPM implementation dropped, enabling LDSR upscaling to function correctly.

Description

This module is copied from the compvis/stable-diffusion repository (the SD V1 repo) and provides full implementations of four classes suffixed with "V1": DDPMV1 implements classic Gaussian diffusion in image/latent space with training, sampling, EMA (Exponential Moving Average) support, and noise schedule registration. LatentDiffusionV1 extends DDPMV1 with first-stage encoding/decoding via VQ or KL autoencoders, conditioning stage integration (CLIP, class labels, etc.), and patch-based split-input processing for large images. DiffusionWrapperV1 routes conditioning through concat, cross-attention, hybrid, or ADM modes to the U-Net. Layout2ImgDiffusionV1 provides layout-to-image generation capabilities. All classes are monkey-patched into the ldm.models.diffusion.ddpm module at load time so they are discoverable by the model loading infrastructure.

Usage

This code is used internally by the LDSR (Latent Diffusion Super Resolution) extension. It is loaded automatically when LDSR upscaling is invoked. Users do not need to interact with it directly; it is a compatibility layer ensuring that VQ-quantized models work with the current codebase.

Code Reference

Source Location

Signature

class DDPMV1(pl.LightningModule):
    def __init__(self, unet_config, timesteps=1000, beta_schedule="linear",
                 loss_type="l2", ckpt_path=None, ignore_keys=None,
                 load_only_unet=False, monitor="val/loss", use_ema=True,
                 first_stage_key="image", image_size=256, channels=3,
                 log_every_t=100, clip_denoised=True, linear_start=1e-4,
                 linear_end=2e-2, cosine_s=8e-3, given_betas=None,
                 original_elbo_weight=0., v_posterior=0.,
                 l_simple_weight=1., conditioning_key=None,
                 parameterization="eps", scheduler_config=None,
                 use_positional_encodings=False, learn_logvar=False,
                 logvar_init=0.): ...

class LatentDiffusionV1(DDPMV1):
    def __init__(self, first_stage_config, cond_stage_config,
                 num_timesteps_cond=None, cond_stage_key="image",
                 cond_stage_trainable=False, concat_mode=True,
                 cond_stage_forward=None, conditioning_key=None,
                 scale_factor=1.0, scale_by_std=False, *args, **kwargs): ...

class DiffusionWrapperV1(pl.LightningModule):
    def __init__(self, diff_model_config, conditioning_key): ...

class Layout2ImgDiffusionV1(LatentDiffusionV1):
    def __init__(self, cond_stage_key='coordinates_bbox', **kwargs): ...

Import

import ldm.models.diffusion.ddpm
# Classes are monkey-patched into this module:
# ldm.models.diffusion.ddpm.DDPMV1
# ldm.models.diffusion.ddpm.LatentDiffusionV1
# ldm.models.diffusion.ddpm.DiffusionWrapperV1
# ldm.models.diffusion.ddpm.Layout2ImgDiffusionV1

I/O Contract

Inputs

Name Type Required Description
unet_config dict Yes Configuration dictionary for instantiating the U-Net backbone model
timesteps int No Number of diffusion timesteps (default: 1000)
beta_schedule str No Type of noise schedule: "linear", "cosine", etc. (default: "linear")
loss_type str No Loss function type: "l1" or "l2" (default: "l2")
parameterization str No Prediction target: "eps" (noise) or "x0" (clean image) (default: "eps")
first_stage_config dict Yes (LatentDiffusionV1) Configuration for the first-stage autoencoder (VQ-VAE or KL-VAE)
cond_stage_config dict Yes (LatentDiffusionV1) Configuration for the conditioning stage model (e.g., CLIP)
scale_factor float No Scaling factor for latent representations (default: 1.0)

Outputs

Name Type Description
samples torch.Tensor Generated image tensors from the diffusion sampling process
loss torch.Tensor Training loss value combining simple loss and VLB terms
loss_dict dict Dictionary of individual loss components for logging

Usage Examples

# This module is used internally by the LDSR extension.
# The classes are automatically patched into ldm.models.diffusion.ddpm
# and instantiated via config files. Typical usage:

from ldm.models.diffusion.ddpm import LatentDiffusionV1

# Model is instantiated from a config file by instantiate_from_config()
# during LDSR model loading. Direct instantiation example:
model = LatentDiffusionV1(
    first_stage_config=vq_config,
    cond_stage_config=cond_config,
    unet_config=unet_config,
    timesteps=1000,
    image_size=64,
    channels=3,
    conditioning_key="crossattn"
)

# Sampling (inference):
with model.ema_scope():
    samples = model.sample(batch_size=1)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment