Principle:AUTOMATIC1111 Stable diffusion webui Noise addition and guided denoising

Knowledge Sources	SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations Denoising Diffusion Probabilistic Models Denoising Diffusion Implicit Models
Domains	Diffusion Models, Image Generation, Image Editing, Inpainting
Last Updated	2026-02-08 00:00 GMT

Overview

Noise addition and guided denoising is the core diffusion process in image-to-image generation where calibrated noise is added to the encoded source image and then iteratively removed under text guidance, with the denoising strength controlling the balance between fidelity to the source and creative freedom.

Description

The image-to-image sampling process implements the SDEdit algorithm: rather than starting from pure noise (as in text-to-image), it starts from the source image encoded in latent space with a controlled amount of noise added. The diffusion model then denoises this noisy latent back to a clean image, guided by the text prompt.

The process involves three key stages:

1. Noise Generation: A noise tensor is generated using the ImageRNG system, which supports deterministic seeded noise, subseed blending via spherical linear interpolation (slerp), and seed-resize functionality for resolution-independent seeds. The noise may be scaled by an initial_noise_multiplier to adjust the overall noise amplitude.

2. Sampler Invocation: The sampler's sample_img2img() method receives the initial latent, the noise tensor, and the text conditioning (positive and negative). Internally, the sampler:

Determines the starting timestep based on denoising strength
Adds noise to the init_latent corresponding to that timestep
Iteratively denoises from the starting timestep to timestep 0
Uses classifier-free guidance to steer generation toward the text prompt

3. Mask Compositing: After sampling, if a mask is present, the denoised samples are composited with the original init_latent using the mask tensors. The formula blends the generated content in masked regions with the preserved original content in unmasked regions. Script hooks (on_mask_blend) can modify this blending behavior.

Usage

The sampling stage is the computational bottleneck of image-to-image generation. Key considerations:

Denoising strength directly controls the starting timestep. Lower values mean fewer denoising steps and closer fidelity to the source.
Initial noise multiplier provides fine-grained control over noise amplitude independently of the timestep schedule.
Sampler choice affects both quality and speed. Different samplers (Euler, DPM++, etc.) have different convergence properties.
The mask compositing step after sampling is critical for inpainting: it ensures that unmasked regions remain exactly as they were in the source, preventing any drift from the denoising process.

Theoretical Basis

The SDEdit process can be formalized as follows. Given the encoded source latent z_0, denoising strength s, and total timesteps T:

Step 1: Determine starting timestep
  t_start = schedule(s, T)    # maps strength to a timestep index

Step 2: Add noise to source latent
  epsilon ~ N(0, I)           # generated from seeded RNG
  z_t = sqrt(alpha_bar_t) * z_0 + sqrt(1 - alpha_bar_t) * epsilon
  where alpha_bar_t is the cumulative noise schedule at t_start

Step 3: Iterative denoising from t_start to 0
  for t = t_start, t_start-1, ..., 1, 0:
    epsilon_pred = UNet(z_t, t, c_text)                    # conditional prediction
    epsilon_uncond = UNet(z_t, t, c_uncond)                # unconditional prediction
    epsilon_guided = epsilon_uncond + cfg_scale * (epsilon_pred - epsilon_uncond)
    z_{t-1} = sampler_step(z_t, epsilon_guided, t)

Step 4: Mask compositing (for inpainting)
  z_final = z_denoised * nmask + z_0 * mask

The mask compositing formula ensures exact preservation of unmasked regions:

samples_final = samples * nmask + init_latent * mask

where:
  nmask = 1.0 in regions to regenerate (masked area)
  mask  = 1.0 in regions to preserve (unmasked area)
  nmask + mask = 1.0 everywhere

The noise multiplier provides an additional scaling factor:

if initial_noise_multiplier != 1.0:
    noise = noise * initial_noise_multiplier

This allows boosting or reducing the noise amplitude beyond what the denoising strength alone controls, useful for fine-tuning the balance between randomness and source fidelity.

The script hook system allows extensions to modify the blending behavior via MaskBlendArgs, enabling custom compositing strategies such as gradient-aware blending or frequency-domain compositing.

Related Pages

Implemented By

Implementation:AUTOMATIC1111_Stable_diffusion_webui_Img2img_sample

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment