Implementation:AUTOMATIC1111 Stable diffusion webui Img2img sample
| Knowledge Sources | |
|---|---|
| Domains | Diffusion Models, Image Generation, Image Editing, Inpainting |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for performing noise injection, text-guided denoising, and mask-based latent compositing in the image-to-image sampling pipeline provided by the AUTOMATIC1111 stable-diffusion-webui repository.
Description
The sample() method of StableDiffusionProcessingImg2Img implements the core SDEdit sampling loop for image-to-image generation. It generates noise, optionally scales it by initial_noise_multiplier, invokes the sampler to denoise the noisy latent under text guidance, and then composites the result with the original latent using the inpainting mask.
The method proceeds through these stages:
1. Noise generation: A noise tensor x is drawn from the seeded ImageRNG instance via self.rng.next(). If initial_noise_multiplier is not 1.0, the noise is scaled and this parameter is recorded in extra_generation_params.
2. Script pre-processing: The process_before_every_sampling script callback is invoked, passing the init_latent, noise, and conditioning tensors. This allows extensions to modify any of these before sampling begins.
3. Sampler invocation: self.sampler.sample_img2img() is called with:
self.init_latentas the encoded source imagexas the noise tensorconditioningandunconditional_conditioningfor classifier-free guidanceself.image_conditioningfor inpainting-model-specific conditioning
The sampler internally determines the starting timestep from self.denoising_strength, adds noise to the init_latent, and runs the iterative denoising loop.
4. Mask compositing: If self.mask is not None (inpainting mode), the method blends the denoised samples with the original init_latent:
blended_samples = samples * self.nmask + self.init_latent * self.mask
The on_mask_blend script hook is then called with a MaskBlendArgs object, allowing extensions to override the blending result.
5. Cleanup: The noise tensor is deleted and GPU memory is freed via devices.torch_gc().
Usage
This method is called by process_images() during each batch iteration. It is invoked within a torch.no_grad() context and either an autocast or without-autocast context depending on the UNet dtype requirements. The returned tensor is subsequently decoded by the VAE to produce pixel-space images.
Code Reference
Source Location
- Repository: stable-diffusion-webui
- File:
modules/processing.py - Lines: 1759-1789
Signature
def sample(
self,
conditioning,
unconditional_conditioning,
seeds,
subseeds,
subseed_strength,
prompts
) -> torch.Tensor:
Import
from modules.processing import StableDiffusionProcessingImg2Img
# sample() is called as a method: p.sample(conditioning, unconditional_conditioning, seeds, subseeds, subseed_strength, prompts)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| conditioning | tuple | Yes | Positive text conditioning tensors (output of prompt encoding) |
| unconditional_conditioning | tuple | Yes | Negative/unconditional text conditioning tensors |
| seeds | list[int] | Yes | Per-image seeds for the current batch |
| subseeds | list[int] | Yes | Per-image subseeds for the current batch |
| subseed_strength | float | Yes | Strength of subseed blending (0.0 = no blending) |
| prompts | list[str] | Yes | Text prompts for the current batch |
Instance state read:
| Name | Type | Description |
|---|---|---|
| self.rng | ImageRNG | Seeded random number generator for noise |
| self.init_latent | torch.Tensor | VAE-encoded source image latent |
| self.initial_noise_multiplier | float | Noise amplitude scaling factor |
| self.sampler | Sampler | The selected diffusion sampler instance |
| self.image_conditioning | torch.Tensor | Inpainting model conditioning |
| self.mask | torch.Tensor or None | Latent mask (1.0 in preserved regions) |
| self.nmask | torch.Tensor or None | Inverse latent mask (1.0 in regenerated regions) |
| self.scripts | ScriptRunner | Script runner for callbacks |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | torch.Tensor | Denoised latent tensor, shape [B, C, H/8, W/8], ready for VAE decoding |
Usage Examples
Basic Usage
# Called internally by process_images() in the generation loop:
# (shown here for reference; normally not called directly)
# After p.init() has been called and conditioning is set up:
samples = p.sample(
conditioning=p.c,
unconditional_conditioning=p.uc,
seeds=p.seeds,
subseeds=p.subseeds,
subseed_strength=p.subseed_strength,
prompts=p.prompts,
)
# samples is a torch.Tensor of shape [batch_size, 4, height//8, width//8]
# It is then passed to decode_latent_batch() for VAE decoding
Understanding the Mask Compositing
# The mask compositing ensures unmasked regions are preserved:
# Given:
# self.nmask: 1.0 where content should be regenerated
# self.mask: 1.0 where content should be preserved
# self.init_latent: original encoded image
# After denoising:
blended = samples * self.nmask + self.init_latent * self.mask
# Example for a 50% mask:
# Regenerated pixel: 1.0 * denoised + 0.0 * original = denoised
# Preserved pixel: 0.0 * denoised + 1.0 * original = original