Implementation:AUTOMATIC1111 Stable diffusion webui DDPM Edit Model
| Knowledge Sources | |
|---|---|
| Domains | Diffusion Models, Image Editing, Latent Diffusion |
| Last Updated | 2025-05-15 00:00 GMT |
Overview
Implements the Denoising Diffusion Probabilistic Model (DDPM) and Latent Diffusion Model classes modified for InstructPix2Pix-style image editing, providing the core diffusion training and inference pipeline for instruction-based image-to-image generation.
Description
This module defines several key classes for the diffusion pipeline:
DDPM: A PyTorch Lightning module that manages noise schedules (beta/alpha cumulative products), EMA (Exponential Moving Average) weights, and forward/reverse diffusion processes. It supports both epsilon-prediction and x0-prediction parameterizations.
LatentDiffusion: ExtendsDDPMto operate in VAE latent space with cross-attention conditioning. It handles first-stage encoding/decoding via a VAE or VQ-VAE, cond-stage processing through CLIP or other text encoders, and supports multiple conditioning keys (concat, crossattn, adm).
DiffusionWrapper: Routes conditioning keys to the underlying UNet model, handling concatenation of image conditions, cross-attention text conditions, and ADM vector conditions.
Layout2ImgDiffusion: A specialized variant for layout-to-image generation.
The module is modified from the original CompVis stable-diffusion implementation by the InstructPix2Pix authors, adding additional input channels to the first UNet layer for conditioning on an input image.
Usage
Use this module when working with InstructPix2Pix-style image editing workflows. The model is loaded when the user selects an InstructPix2Pix checkpoint, providing the core diffusion backbone for instruction-guided image modification.
Code Reference
Source Location
- Repository: AUTOMATIC1111_Stable_diffusion_webui
- File: modules/models/diffusion/ddpm_edit.py
- Lines: 1-1460
Signature
class DDPM(pl.LightningModule):
def __init__(self, unet_config, timesteps=1000, beta_schedule="linear",
loss_type="l2", ckpt_path=None, ignore_keys=None, ...):
def forward(self, x, *args, **kwargs):
class LatentDiffusion(DDPM):
def __init__(self, first_stage_config, cond_stage_config,
num_timesteps_cond=None, cond_stage_key="image", ...):
class DiffusionWrapper(pl.LightningModule):
def __init__(self, diff_model_config, conditioning_key):
def forward(self, x, t, c_concat=None, c_crossattn=None, c_adm=None):
Import
from modules.models.diffusion.ddpm_edit import DDPM, LatentDiffusion, DiffusionWrapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| x | torch.Tensor | Yes | Input image tensor or latent representation with shape (N, C, H, W) |
| t | torch.Tensor | Yes | Diffusion timestep indices as a long tensor with shape (N,) |
| c_concat | list[torch.Tensor] | No | Concatenated image conditions for the UNet input |
| c_crossattn | list[torch.Tensor] | No | Cross-attention text conditioning tensors |
| c_adm | torch.Tensor | No | ADM (Adaptive) vector conditioning |
Outputs
| Name | Type | Description |
|---|---|---|
| loss | torch.Tensor | Training loss combining simple loss, VLB loss, and optional ELBO weighting |
| loss_dict | dict | Dictionary of named loss components for logging |
Usage Examples
from modules.models.diffusion.ddpm_edit import LatentDiffusion
# The model is typically instantiated from a config and checkpoint
# by the model loading infrastructure, not directly by user code.
# Example of how the DiffusionWrapper routes conditioning:
# wrapper = DiffusionWrapper(unet_config, conditioning_key="hybrid")
# output = wrapper(x_noisy, timesteps, c_concat=[img_cond], c_crossattn=[text_cond])
# The LatentDiffusion class handles encoding/decoding:
# z = model.get_first_stage_encoding(model.encode_first_stage(image))
# c = model.get_learned_conditioning(prompt)
# noise_pred = model.apply_model(z_noisy, t, cond=c)