Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:AUTOMATIC1111 Stable diffusion webui Latent sampling

From Leeroopedia


Knowledge Sources
Domains Diffusion Models, Sampling, Stochastic Processes
Last Updated 2026-02-08 00:00 GMT

Overview

Latent sampling is the iterative denoising process that transforms a random noise tensor in latent space into a structured representation conditioned on text embeddings, using a trained UNet with classifier-free guidance.

Description

The core of image generation in a latent diffusion model is the reverse diffusion process (sampling). Starting from pure Gaussian noise, the model progressively removes noise over a sequence of steps until a clean latent image emerges. At each step, the UNet predicts the noise component in the current noisy latent, and this prediction is used to compute a less noisy version.

The process is governed by three interacting components:

  • The sampling algorithm -- Determines how the noise prediction is used to update the latent at each step. Different algorithms (Euler, DPM++, Heun, etc.) trade off speed, quality, and stochasticity.
  • Classifier-free guidance (CFG) -- A technique that combines conditional (prompt-guided) and unconditional (negative prompt) noise predictions to strengthen the model's adherence to the prompt.
  • The noise schedule -- A monotonically decreasing sequence of noise levels (sigmas) that defines the trajectory from pure noise to clean signal.

Usage

Latent sampling is the central computational step in every text-to-image and image-to-image generation. It is the most time-consuming phase, as it requires multiple forward passes through the UNet (typically 20-50 passes, doubled for CFG).

Theoretical Basis

Diffusion Forward and Reverse Process

The forward diffusion process gradually adds Gaussian noise to data:

q(z_t | z_0) = N(z_t; alpha_t * z_0, sigma_t^2 * I)

The reverse process learns to denoise:

p_theta(z_{t-1} | z_t) = N(z_{t-1}; mu_theta(z_t, t), sigma_t^2 * I)

The UNet epsilon_theta(z_t, t, c) predicts the noise epsilon that was added, and the denoised sample is recovered as:

z_0_pred = (z_t - sigma_t * epsilon_theta(z_t, t, c)) / alpha_t

Classifier-Free Guidance (CFG)

CFG combines conditional and unconditional predictions:

epsilon_guided = epsilon_uncond + cfg_scale * (epsilon_cond - epsilon_uncond)

This requires two UNet forward passes per step: one with the positive prompt conditioning and one with the negative prompt (unconditional) conditioning. The cfg_scale parameter controls the strength of guidance.

For multi-prompt conditioning (e.g., prompt editing), the guided prediction is:

epsilon_guided = epsilon_uncond + sum_i(weight_i * cfg_scale * (epsilon_cond_i - epsilon_uncond))

Noise Schedules

The noise schedule defines the sequence of sigma values across sampling steps:

  • Uniform/Default -- Linearly interpolated from the model's trained noise schedule
  • Karras -- Proposed by Karras et al. (2022): sigma_i = (sigma_max^(1/rho) + i/(n-1) * (sigma_min^(1/rho) - sigma_max^(1/rho)))^rho where rho=7 by default. Concentrates steps at lower noise levels.
  • Exponential -- Exponentially spaced: sigma_i = sigma_min * (sigma_max/sigma_min)^(i/(n-1))
  • Beta -- Uses a beta distribution CDF for non-uniform spacing

Sampling Algorithms

Common samplers and their characteristics:

  • Euler -- First-order method, simple and fast
  • Euler a (ancestral) -- Euler with added stochastic noise at each step
  • Heun -- Second-order method, higher quality but 2x UNet evaluations per step
  • DPM++ 2M -- Multi-step DPM-Solver++ with second-order accuracy
  • DPM++ 2M Karras -- DPM++ 2M with Karras noise schedule
  • DPM++ SDE -- Stochastic variant using Brownian noise

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment