Implementation:AUTOMATIC1111 Stable diffusion webui Img2img init
| Knowledge Sources | |
|---|---|
| Domains | Image Generation, Variational Autoencoders, Latent Space, Image Processing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for preprocessing input images, computing inpainting masks, encoding images into latent space via the VAE, and constructing overlay images for the img2img pipeline provided by the AUTOMATIC1111 stable-diffusion-webui repository.
Description
The init() method of StableDiffusionProcessingImg2Img performs the complete initialization pipeline that transforms user-provided pixel images and masks into the latent-space tensors required by the diffusion sampler.
The method executes the following stages in order:
1. Sampler creation: Instantiates the selected sampler via sd_samplers.create_sampler().
2. Mask processing: If an image mask is provided, it is converted to binary via create_binary_mask(), optionally inverted, and blurred with separate horizontal/vertical Gaussian kernels using OpenCV. The kernel sizes are computed as 2 * int(2.5 * blur + 0.5) + 1.
3. Crop region computation: For inpaint-full-res mode, masking.get_crop_region_v2() computes the bounding box of the masked area, which is then expanded by masking.expand_crop_region() to match the target aspect ratio. The paste-back coordinates are stored in self.paste_to.
4. Overlay construction: For each init image, an RGBA overlay is constructed by compositing the original image with an inverted mask, creating a transparent overlay that preserves unmasked regions for later compositing.
5. Content-aware fill: If inpainting_fill != 1, the image is processed through masking.fill() to fill the masked region with plausible content before encoding.
6. Color correction calibration: If enabled, setup_color_correction() captures the LAB histogram of each source image for later histogram matching.
7. VAE encoding: The image batch is converted to a torch tensor, moved to the GPU with VAE dtype, and encoded via images_tensor_to_samples() to produce self.init_latent.
8. Latent mask construction: The pixel mask is downsampled to latent dimensions, creating self.mask (preserve regions) and self.nmask (regenerate regions). For fill mode 2, masked latent regions are replaced with random noise; for fill mode 3, they are zeroed.
9. Image conditioning: self.img2img_image_conditioning() produces the conditioning tensor required by inpainting-capable models.
The helper function create_binary_mask() extracts the alpha channel from RGBA images and thresholds it at 128 (when round=True) to produce a clean binary mask, or preserves the full gradient otherwise.
Usage
This method is called automatically by process_images() at the start of each generation run. It should not normally be called directly, but understanding its behavior is essential for debugging inpainting artifacts, mask boundary issues, and VAE encoding problems.
Code Reference
Source Location
- Repository: stable-diffusion-webui
- File:
modules/processing.py - Lines: 1602-1758 (init method), 90-98 (create_binary_mask)
Signature
def init(self, all_prompts: list[str], all_seeds: list[int], all_subseeds: list[int]) -> None:
def create_binary_mask(image, round=True):
Import
from modules.processing import StableDiffusionProcessingImg2Img
# init() is called as a method: p.init(all_prompts, all_seeds, all_subseeds)
from modules.processing import create_binary_mask
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| all_prompts | list[str] | Yes | All prompts for the batch (used for seed-based latent noise in fill mode 2) |
| all_seeds | list[int] | Yes | All seeds for the batch (used for random tensor creation in fill mode 2) |
| all_subseeds | list[int] | Yes | All subseeds for the batch |
Instance state read:
| Name | Type | Description |
|---|---|---|
| self.init_images | list[PIL.Image] | Source images to encode |
| self.image_mask | PIL.Image or None | Inpainting mask (set from mask parameter in __post_init__) |
| self.mask_blur_x, self.mask_blur_y | int | Gaussian blur radii for mask |
| self.inpainting_fill | int | Fill mode: 0=fill, 1=original, 2=latent noise, 3=latent nothing |
| self.inpaint_full_res | bool | Whether to crop to mask region for full-res inpainting |
| self.inpaint_full_res_padding | int | Padding around masked crop region |
| self.resize_mode | int | Image resize strategy |
| self.mask_round | bool | Whether to binarize the mask |
Outputs
| Name | Type | Description |
|---|---|---|
| return | None | Method modifies instance state in-place |
| self.init_latent | torch.Tensor | VAE-encoded latent tensor, shape [B, C, H/8, W/8] |
| self.nmask | torch.Tensor | Latent mask (1.0 in regions to regenerate), shape [C, H/8, W/8] |
| self.mask | torch.Tensor | Inverse latent mask (1.0 in regions to preserve), shape [C, H/8, W/8] |
| self.image_conditioning | torch.Tensor | Model-specific conditioning tensor |
| self.overlay_images | list[PIL.Image] | RGBA overlays of original unmasked regions |
| self.mask_for_overlay | PIL.Image | Pixel-space mask for post-generation compositing |
| self.paste_to | tuple or None | (x, y, w, h) coordinates for pasting back cropped inpaint results |
| self.color_corrections | list | LAB histogram targets for color correction |
Usage Examples
Basic Usage
from modules.processing import StableDiffusionProcessingImg2Img, process_images
# Normally called internally by process_images():
p = StableDiffusionProcessingImg2Img(
sd_model=shared.sd_model,
init_images=[pil_image],
mask=mask_image,
mask_blur=4,
inpainting_fill=0,
denoising_strength=0.75,
width=512,
height=512,
)
# init() is called automatically by process_images()
# but can be called manually for inspection:
p.init(
all_prompts=["a painting of a cat"],
all_seeds=[42],
all_subseeds=[0],
)
# After init(), the following are available:
print(p.init_latent.shape) # torch.Size([1, 4, 64, 64])
print(p.nmask.shape) # torch.Size([4, 64, 64]) if mask was provided
print(p.overlay_images) # list of RGBA PIL images
create_binary_mask Usage
from modules.processing import create_binary_mask
from PIL import Image
# Convert an RGBA mask to binary L-mode image
rgba_mask = Image.open("mask.png") # RGBA with alpha channel
binary_mask = create_binary_mask(rgba_mask, round=True)
# Result: L-mode image with values 0 or 255