Implementation:AUTOMATIC1111 Stable diffusion webui Img2img init

Knowledge Sources	stable-diffusion-webui
Domains	Image Generation, Variational Autoencoders, Latent Space, Image Processing
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for preprocessing input images, computing inpainting masks, encoding images into latent space via the VAE, and constructing overlay images for the img2img pipeline provided by the AUTOMATIC1111 stable-diffusion-webui repository.

Description

The init() method of StableDiffusionProcessingImg2Img performs the complete initialization pipeline that transforms user-provided pixel images and masks into the latent-space tensors required by the diffusion sampler.

The method executes the following stages in order:

1. Sampler creation: Instantiates the selected sampler via sd_samplers.create_sampler().

2. Mask processing: If an image mask is provided, it is converted to binary via create_binary_mask(), optionally inverted, and blurred with separate horizontal/vertical Gaussian kernels using OpenCV. The kernel sizes are computed as 2 * int(2.5 * blur + 0.5) + 1.

3. Crop region computation: For inpaint-full-res mode, masking.get_crop_region_v2() computes the bounding box of the masked area, which is then expanded by masking.expand_crop_region() to match the target aspect ratio. The paste-back coordinates are stored in self.paste_to.

4. Overlay construction: For each init image, an RGBA overlay is constructed by compositing the original image with an inverted mask, creating a transparent overlay that preserves unmasked regions for later compositing.

5. Content-aware fill: If inpainting_fill != 1, the image is processed through masking.fill() to fill the masked region with plausible content before encoding.

6. Color correction calibration: If enabled, setup_color_correction() captures the LAB histogram of each source image for later histogram matching.

7. VAE encoding: The image batch is converted to a torch tensor, moved to the GPU with VAE dtype, and encoded via images_tensor_to_samples() to produce self.init_latent.

8. Latent mask construction: The pixel mask is downsampled to latent dimensions, creating self.mask (preserve regions) and self.nmask (regenerate regions). For fill mode 2, masked latent regions are replaced with random noise; for fill mode 3, they are zeroed.

9. Image conditioning: self.img2img_image_conditioning() produces the conditioning tensor required by inpainting-capable models.

The helper function create_binary_mask() extracts the alpha channel from RGBA images and thresholds it at 128 (when round=True) to produce a clean binary mask, or preserves the full gradient otherwise.

Usage

This method is called automatically by process_images() at the start of each generation run. It should not normally be called directly, but understanding its behavior is essential for debugging inpainting artifacts, mask boundary issues, and VAE encoding problems.

Code Reference

Source Location

Repository: stable-diffusion-webui
File: modules/processing.py
Lines: 1602-1758 (init method), 90-98 (create_binary_mask)

Signature

def init(self, all_prompts: list[str], all_seeds: list[int], all_subseeds: list[int]) -> None:

def create_binary_mask(image, round=True):

Import

from modules.processing import StableDiffusionProcessingImg2Img
# init() is called as a method: p.init(all_prompts, all_seeds, all_subseeds)

from modules.processing import create_binary_mask

I/O Contract

Inputs

Name	Type	Required	Description
all_prompts	list[str]	Yes	All prompts for the batch (used for seed-based latent noise in fill mode 2)
all_seeds	list[int]	Yes	All seeds for the batch (used for random tensor creation in fill mode 2)
all_subseeds	list[int]	Yes	All subseeds for the batch

Instance state read:

Name	Type	Description
self.init_images	list[PIL.Image]	Source images to encode
self.image_mask	PIL.Image or None	Inpainting mask (set from mask parameter in __post_init__)
self.mask_blur_x, self.mask_blur_y	int	Gaussian blur radii for mask
self.inpainting_fill	int	Fill mode: 0=fill, 1=original, 2=latent noise, 3=latent nothing
self.inpaint_full_res	bool	Whether to crop to mask region for full-res inpainting
self.inpaint_full_res_padding	int	Padding around masked crop region
self.resize_mode	int	Image resize strategy
self.mask_round	bool	Whether to binarize the mask

Outputs

Name	Type	Description
return	None	Method modifies instance state in-place
self.init_latent	torch.Tensor	VAE-encoded latent tensor, shape [B, C, H/8, W/8]
self.nmask	torch.Tensor	Latent mask (1.0 in regions to regenerate), shape [C, H/8, W/8]
self.mask	torch.Tensor	Inverse latent mask (1.0 in regions to preserve), shape [C, H/8, W/8]
self.image_conditioning	torch.Tensor	Model-specific conditioning tensor
self.overlay_images	list[PIL.Image]	RGBA overlays of original unmasked regions
self.mask_for_overlay	PIL.Image	Pixel-space mask for post-generation compositing
self.paste_to	tuple or None	(x, y, w, h) coordinates for pasting back cropped inpaint results
self.color_corrections	list	LAB histogram targets for color correction

Usage Examples

Basic Usage

from modules.processing import StableDiffusionProcessingImg2Img, process_images

# Normally called internally by process_images():
p = StableDiffusionProcessingImg2Img(
    sd_model=shared.sd_model,
    init_images=[pil_image],
    mask=mask_image,
    mask_blur=4,
    inpainting_fill=0,
    denoising_strength=0.75,
    width=512,
    height=512,
)

# init() is called automatically by process_images()
# but can be called manually for inspection:
p.init(
    all_prompts=["a painting of a cat"],
    all_seeds=[42],
    all_subseeds=[0],
)

# After init(), the following are available:
print(p.init_latent.shape)   # torch.Size([1, 4, 64, 64])
print(p.nmask.shape)         # torch.Size([4, 64, 64]) if mask was provided
print(p.overlay_images)      # list of RGBA PIL images

create_binary_mask Usage

from modules.processing import create_binary_mask
from PIL import Image

# Convert an RGBA mask to binary L-mode image
rgba_mask = Image.open("mask.png")  # RGBA with alpha channel
binary_mask = create_binary_mask(rgba_mask, round=True)
# Result: L-mode image with values 0 or 255

Related Pages

Implements Principle

Principle:AUTOMATIC1111_Stable_diffusion_webui_Image_preprocessing_and_latent_encoding

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment