Principle:AUTOMATIC1111 Stable diffusion webui Input image mode configuration
| Knowledge Sources | |
|---|---|
| Domains | Image Generation, Image Editing, Inpainting, Diffusion Models |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Input image mode configuration defines how the source image is interpreted and preprocessed before entering the image-to-image diffusion pipeline, dispatching among standard transformation, sketch-based input, masked inpainting, and batch processing modes.
Description
In an image-to-image generation workflow, the user can supply visual input through several distinct modalities. Each modality determines how the source pixels are extracted, whether a mask is computed, and how that mask is blended before the image enters the latent encoding stage.
The fundamental modes are:
- Standard img2img (mode 0): The user provides a single image. No mask is generated. The entire image is treated as the conditioning source for the diffusion process. The denoising strength controls how much the output diverges from the input.
- Sketch (mode 1): The user provides a sketch image drawn on a canvas. Like standard mode, no mask is generated, but the source is expected to be a hand-drawn sketch rather than a photograph.
- Inpaint (mode 2): The user provides an image together with a painted mask region. A binary mask is created from the alpha or luminance channel, indicating which regions should be regenerated. Only masked areas are denoised; unmasked areas are preserved from the original.
- Inpaint sketch (mode 3): The user modifies the original image by painting colored strokes. The mask is computed automatically by detecting pixel differences between the modified and original images. A mask alpha parameter controls the blending strength, and Gaussian blur smooths the transition boundary.
- Inpaint upload mask (mode 4): The user provides both the source image and a separate mask image. This is useful for programmatic workflows where the mask is generated externally.
- Batch (mode 5): Multiple images are processed sequentially through the pipeline, each with optional per-image masks and metadata extracted from PNG info.
Usage
Mode configuration should be selected based on the creative task:
- Use mode 0 for general style transfer, enhancement, or prompt-guided transformation of existing images.
- Use mode 1 for converting rough sketches into detailed images.
- Use modes 2-4 for targeted region editing (inpainting), choosing the mask input method most convenient for the workflow.
- Use mode 5 for automated batch processing of image directories or upload sets.
Theoretical Basis
The mode dispatch pattern follows a strategy architecture where the integer mode value selects both the image source and the mask computation strategy. Mathematically, the mode determines two variables:
Given mode m in {0, 1, 2, 3, 4, 5}:
image(m) = source image selected by mode
mask(m) = None if m in {0, 1, 5}
= binary_threshold(alpha) if m == 2
= pixel_diff(modified, orig) if m == 3
= uploaded_mask if m == 4
For inpaint sketch mode (mode 3), the mask derivation involves a per-pixel comparison:
pred = any(image_pixels != original_pixels, axis=color_channel)
mask = pred * 255 (as uint8 grayscale)
mask = adjust_brightness(mask, 1 - alpha / 100)
blurred_mask = gaussian_blur(mask, radius=mask_blur)
composite_image = blend(blur(image, mask_blur), original, blurred_mask)
This produces a soft-edged mask that transitions smoothly between painted and unpainted regions, allowing the diffusion model to generate coherent boundaries.
The dispatch architecture ensures that downstream processing (VAE encoding, latent noise injection, denoising) receives a uniform (image, mask) pair regardless of input mode, enabling a single unified pipeline for all image-to-image workflows.