Principle:AUTOMATIC1111 Stable diffusion webui Latent sampling
| Knowledge Sources | |
|---|---|
| Domains | Diffusion Models, Sampling, Stochastic Processes |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Latent sampling is the iterative denoising process that transforms a random noise tensor in latent space into a structured representation conditioned on text embeddings, using a trained UNet with classifier-free guidance.
Description
The core of image generation in a latent diffusion model is the reverse diffusion process (sampling). Starting from pure Gaussian noise, the model progressively removes noise over a sequence of steps until a clean latent image emerges. At each step, the UNet predicts the noise component in the current noisy latent, and this prediction is used to compute a less noisy version.
The process is governed by three interacting components:
- The sampling algorithm -- Determines how the noise prediction is used to update the latent at each step. Different algorithms (Euler, DPM++, Heun, etc.) trade off speed, quality, and stochasticity.
- Classifier-free guidance (CFG) -- A technique that combines conditional (prompt-guided) and unconditional (negative prompt) noise predictions to strengthen the model's adherence to the prompt.
- The noise schedule -- A monotonically decreasing sequence of noise levels (sigmas) that defines the trajectory from pure noise to clean signal.
Usage
Latent sampling is the central computational step in every text-to-image and image-to-image generation. It is the most time-consuming phase, as it requires multiple forward passes through the UNet (typically 20-50 passes, doubled for CFG).
Theoretical Basis
Diffusion Forward and Reverse Process
The forward diffusion process gradually adds Gaussian noise to data:
q(z_t | z_0) = N(z_t; alpha_t * z_0, sigma_t^2 * I)
The reverse process learns to denoise:
p_theta(z_{t-1} | z_t) = N(z_{t-1}; mu_theta(z_t, t), sigma_t^2 * I)
The UNet epsilon_theta(z_t, t, c) predicts the noise epsilon that was added, and the denoised sample is recovered as:
z_0_pred = (z_t - sigma_t * epsilon_theta(z_t, t, c)) / alpha_t
Classifier-Free Guidance (CFG)
CFG combines conditional and unconditional predictions:
epsilon_guided = epsilon_uncond + cfg_scale * (epsilon_cond - epsilon_uncond)
This requires two UNet forward passes per step: one with the positive prompt conditioning and one with the negative prompt (unconditional) conditioning. The cfg_scale parameter controls the strength of guidance.
For multi-prompt conditioning (e.g., prompt editing), the guided prediction is:
epsilon_guided = epsilon_uncond + sum_i(weight_i * cfg_scale * (epsilon_cond_i - epsilon_uncond))
Noise Schedules
The noise schedule defines the sequence of sigma values across sampling steps:
- Uniform/Default -- Linearly interpolated from the model's trained noise schedule
- Karras -- Proposed by Karras et al. (2022):
sigma_i = (sigma_max^(1/rho) + i/(n-1) * (sigma_min^(1/rho) - sigma_max^(1/rho)))^rhowhere rho=7 by default. Concentrates steps at lower noise levels. - Exponential -- Exponentially spaced:
sigma_i = sigma_min * (sigma_max/sigma_min)^(i/(n-1)) - Beta -- Uses a beta distribution CDF for non-uniform spacing
Sampling Algorithms
Common samplers and their characteristics:
- Euler -- First-order method, simple and fast
- Euler a (ancestral) -- Euler with added stochastic noise at each step
- Heun -- Second-order method, higher quality but 2x UNet evaluations per step
- DPM++ 2M -- Multi-step DPM-Solver++ with second-order accuracy
- DPM++ 2M Karras -- DPM++ 2M with Karras noise schedule
- DPM++ SDE -- Stochastic variant using Brownian noise
Related Pages
Implemented By
Uses Heuristic
- Heuristic:AUTOMATIC1111_Stable_diffusion_webui_VRAM_Management_Strategies
- Heuristic:AUTOMATIC1111_Stable_diffusion_webui_Cross_Attention_Memory_Slicing
- Heuristic:AUTOMATIC1111_Stable_diffusion_webui_GTX_16_Series_FP16_Workaround
- Heuristic:AUTOMATIC1111_Stable_diffusion_webui_NaN_Detection_And_Precision_Fixes
- Heuristic:AUTOMATIC1111_Stable_diffusion_webui_UNet_Performance_Patches