Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:AUTOMATIC1111 Stable diffusion webui Img2img init

From Leeroopedia


Knowledge Sources
Domains Image Generation, Variational Autoencoders, Latent Space, Image Processing
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for preprocessing input images, computing inpainting masks, encoding images into latent space via the VAE, and constructing overlay images for the img2img pipeline provided by the AUTOMATIC1111 stable-diffusion-webui repository.

Description

The init() method of StableDiffusionProcessingImg2Img performs the complete initialization pipeline that transforms user-provided pixel images and masks into the latent-space tensors required by the diffusion sampler.

The method executes the following stages in order:

1. Sampler creation: Instantiates the selected sampler via sd_samplers.create_sampler().

2. Mask processing: If an image mask is provided, it is converted to binary via create_binary_mask(), optionally inverted, and blurred with separate horizontal/vertical Gaussian kernels using OpenCV. The kernel sizes are computed as 2 * int(2.5 * blur + 0.5) + 1.

3. Crop region computation: For inpaint-full-res mode, masking.get_crop_region_v2() computes the bounding box of the masked area, which is then expanded by masking.expand_crop_region() to match the target aspect ratio. The paste-back coordinates are stored in self.paste_to.

4. Overlay construction: For each init image, an RGBA overlay is constructed by compositing the original image with an inverted mask, creating a transparent overlay that preserves unmasked regions for later compositing.

5. Content-aware fill: If inpainting_fill != 1, the image is processed through masking.fill() to fill the masked region with plausible content before encoding.

6. Color correction calibration: If enabled, setup_color_correction() captures the LAB histogram of each source image for later histogram matching.

7. VAE encoding: The image batch is converted to a torch tensor, moved to the GPU with VAE dtype, and encoded via images_tensor_to_samples() to produce self.init_latent.

8. Latent mask construction: The pixel mask is downsampled to latent dimensions, creating self.mask (preserve regions) and self.nmask (regenerate regions). For fill mode 2, masked latent regions are replaced with random noise; for fill mode 3, they are zeroed.

9. Image conditioning: self.img2img_image_conditioning() produces the conditioning tensor required by inpainting-capable models.

The helper function create_binary_mask() extracts the alpha channel from RGBA images and thresholds it at 128 (when round=True) to produce a clean binary mask, or preserves the full gradient otherwise.

Usage

This method is called automatically by process_images() at the start of each generation run. It should not normally be called directly, but understanding its behavior is essential for debugging inpainting artifacts, mask boundary issues, and VAE encoding problems.

Code Reference

Source Location

  • Repository: stable-diffusion-webui
  • File: modules/processing.py
  • Lines: 1602-1758 (init method), 90-98 (create_binary_mask)

Signature

def init(self, all_prompts: list[str], all_seeds: list[int], all_subseeds: list[int]) -> None:
def create_binary_mask(image, round=True):

Import

from modules.processing import StableDiffusionProcessingImg2Img
# init() is called as a method: p.init(all_prompts, all_seeds, all_subseeds)

from modules.processing import create_binary_mask

I/O Contract

Inputs

Name Type Required Description
all_prompts list[str] Yes All prompts for the batch (used for seed-based latent noise in fill mode 2)
all_seeds list[int] Yes All seeds for the batch (used for random tensor creation in fill mode 2)
all_subseeds list[int] Yes All subseeds for the batch

Instance state read:

Name Type Description
self.init_images list[PIL.Image] Source images to encode
self.image_mask PIL.Image or None Inpainting mask (set from mask parameter in __post_init__)
self.mask_blur_x, self.mask_blur_y int Gaussian blur radii for mask
self.inpainting_fill int Fill mode: 0=fill, 1=original, 2=latent noise, 3=latent nothing
self.inpaint_full_res bool Whether to crop to mask region for full-res inpainting
self.inpaint_full_res_padding int Padding around masked crop region
self.resize_mode int Image resize strategy
self.mask_round bool Whether to binarize the mask

Outputs

Name Type Description
return None Method modifies instance state in-place
self.init_latent torch.Tensor VAE-encoded latent tensor, shape [B, C, H/8, W/8]
self.nmask torch.Tensor Latent mask (1.0 in regions to regenerate), shape [C, H/8, W/8]
self.mask torch.Tensor Inverse latent mask (1.0 in regions to preserve), shape [C, H/8, W/8]
self.image_conditioning torch.Tensor Model-specific conditioning tensor
self.overlay_images list[PIL.Image] RGBA overlays of original unmasked regions
self.mask_for_overlay PIL.Image Pixel-space mask for post-generation compositing
self.paste_to tuple or None (x, y, w, h) coordinates for pasting back cropped inpaint results
self.color_corrections list LAB histogram targets for color correction

Usage Examples

Basic Usage

from modules.processing import StableDiffusionProcessingImg2Img, process_images

# Normally called internally by process_images():
p = StableDiffusionProcessingImg2Img(
    sd_model=shared.sd_model,
    init_images=[pil_image],
    mask=mask_image,
    mask_blur=4,
    inpainting_fill=0,
    denoising_strength=0.75,
    width=512,
    height=512,
)

# init() is called automatically by process_images()
# but can be called manually for inspection:
p.init(
    all_prompts=["a painting of a cat"],
    all_seeds=[42],
    all_subseeds=[0],
)

# After init(), the following are available:
print(p.init_latent.shape)   # torch.Size([1, 4, 64, 64])
print(p.nmask.shape)         # torch.Size([4, 64, 64]) if mask was provided
print(p.overlay_images)      # list of RGBA PIL images

create_binary_mask Usage

from modules.processing import create_binary_mask
from PIL import Image

# Convert an RGBA mask to binary L-mode image
rgba_mask = Image.open("mask.png")  # RGBA with alpha channel
binary_mask = create_binary_mask(rgba_mask, round=True)
# Result: L-mode image with values 0 or 255

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment