Implementation:AUTOMATIC1111 Stable diffusion webui Img2img sample

Knowledge Sources	stable-diffusion-webui
Domains	Diffusion Models, Image Generation, Image Editing, Inpainting
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for performing noise injection, text-guided denoising, and mask-based latent compositing in the image-to-image sampling pipeline provided by the AUTOMATIC1111 stable-diffusion-webui repository.

Description

The sample() method of StableDiffusionProcessingImg2Img implements the core SDEdit sampling loop for image-to-image generation. It generates noise, optionally scales it by initial_noise_multiplier, invokes the sampler to denoise the noisy latent under text guidance, and then composites the result with the original latent using the inpainting mask.

The method proceeds through these stages:

1. Noise generation: A noise tensor x is drawn from the seeded ImageRNG instance via self.rng.next(). If initial_noise_multiplier is not 1.0, the noise is scaled and this parameter is recorded in extra_generation_params.

2. Script pre-processing: The process_before_every_sampling script callback is invoked, passing the init_latent, noise, and conditioning tensors. This allows extensions to modify any of these before sampling begins.

3. Sampler invocation: self.sampler.sample_img2img() is called with:

self.init_latent as the encoded source image
x as the noise tensor
conditioning and unconditional_conditioning for classifier-free guidance
self.image_conditioning for inpainting-model-specific conditioning

The sampler internally determines the starting timestep from self.denoising_strength, adds noise to the init_latent, and runs the iterative denoising loop.

4. Mask compositing: If self.mask is not None (inpainting mode), the method blends the denoised samples with the original init_latent:

blended_samples = samples * self.nmask + self.init_latent * self.mask

The on_mask_blend script hook is then called with a MaskBlendArgs object, allowing extensions to override the blending result.

5. Cleanup: The noise tensor is deleted and GPU memory is freed via devices.torch_gc().

Usage

This method is called by process_images() during each batch iteration. It is invoked within a torch.no_grad() context and either an autocast or without-autocast context depending on the UNet dtype requirements. The returned tensor is subsequently decoded by the VAE to produce pixel-space images.

Code Reference

Source Location

Repository: stable-diffusion-webui
File: modules/processing.py
Lines: 1759-1789

Signature

def sample(
    self,
    conditioning,
    unconditional_conditioning,
    seeds,
    subseeds,
    subseed_strength,
    prompts
) -> torch.Tensor:

Import

from modules.processing import StableDiffusionProcessingImg2Img
# sample() is called as a method: p.sample(conditioning, unconditional_conditioning, seeds, subseeds, subseed_strength, prompts)

I/O Contract

Inputs

Name	Type	Required	Description
conditioning	tuple	Yes	Positive text conditioning tensors (output of prompt encoding)
unconditional_conditioning	tuple	Yes	Negative/unconditional text conditioning tensors
seeds	list[int]	Yes	Per-image seeds for the current batch
subseeds	list[int]	Yes	Per-image subseeds for the current batch
subseed_strength	float	Yes	Strength of subseed blending (0.0 = no blending)
prompts	list[str]	Yes	Text prompts for the current batch

Instance state read:

Name	Type	Description
self.rng	ImageRNG	Seeded random number generator for noise
self.init_latent	torch.Tensor	VAE-encoded source image latent
self.initial_noise_multiplier	float	Noise amplitude scaling factor
self.sampler	Sampler	The selected diffusion sampler instance
self.image_conditioning	torch.Tensor	Inpainting model conditioning
self.mask	torch.Tensor or None	Latent mask (1.0 in preserved regions)
self.nmask	torch.Tensor or None	Inverse latent mask (1.0 in regenerated regions)
self.scripts	ScriptRunner	Script runner for callbacks

Outputs

Name	Type	Description
samples	torch.Tensor	Denoised latent tensor, shape [B, C, H/8, W/8], ready for VAE decoding

Usage Examples

Basic Usage

# Called internally by process_images() in the generation loop:
# (shown here for reference; normally not called directly)

# After p.init() has been called and conditioning is set up:
samples = p.sample(
    conditioning=p.c,
    unconditional_conditioning=p.uc,
    seeds=p.seeds,
    subseeds=p.subseeds,
    subseed_strength=p.subseed_strength,
    prompts=p.prompts,
)

# samples is a torch.Tensor of shape [batch_size, 4, height//8, width//8]
# It is then passed to decode_latent_batch() for VAE decoding

Understanding the Mask Compositing

# The mask compositing ensures unmasked regions are preserved:
# Given:
#   self.nmask: 1.0 where content should be regenerated
#   self.mask:  1.0 where content should be preserved
#   self.init_latent: original encoded image

# After denoising:
blended = samples * self.nmask + self.init_latent * self.mask

# Example for a 50% mask:
# Regenerated pixel: 1.0 * denoised + 0.0 * original = denoised
# Preserved pixel:   0.0 * denoised + 1.0 * original = original

Related Pages

Implements Principle

Principle:AUTOMATIC1111_Stable_diffusion_webui_Noise_addition_and_guided_denoising

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment