Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:AUTOMATIC1111 Stable diffusion webui Input image mode configuration

From Leeroopedia


Knowledge Sources
Domains Image Generation, Image Editing, Inpainting, Diffusion Models
Last Updated 2026-02-08 00:00 GMT

Overview

Input image mode configuration defines how the source image is interpreted and preprocessed before entering the image-to-image diffusion pipeline, dispatching among standard transformation, sketch-based input, masked inpainting, and batch processing modes.

Description

In an image-to-image generation workflow, the user can supply visual input through several distinct modalities. Each modality determines how the source pixels are extracted, whether a mask is computed, and how that mask is blended before the image enters the latent encoding stage.

The fundamental modes are:

  • Standard img2img (mode 0): The user provides a single image. No mask is generated. The entire image is treated as the conditioning source for the diffusion process. The denoising strength controls how much the output diverges from the input.
  • Sketch (mode 1): The user provides a sketch image drawn on a canvas. Like standard mode, no mask is generated, but the source is expected to be a hand-drawn sketch rather than a photograph.
  • Inpaint (mode 2): The user provides an image together with a painted mask region. A binary mask is created from the alpha or luminance channel, indicating which regions should be regenerated. Only masked areas are denoised; unmasked areas are preserved from the original.
  • Inpaint sketch (mode 3): The user modifies the original image by painting colored strokes. The mask is computed automatically by detecting pixel differences between the modified and original images. A mask alpha parameter controls the blending strength, and Gaussian blur smooths the transition boundary.
  • Inpaint upload mask (mode 4): The user provides both the source image and a separate mask image. This is useful for programmatic workflows where the mask is generated externally.
  • Batch (mode 5): Multiple images are processed sequentially through the pipeline, each with optional per-image masks and metadata extracted from PNG info.

Usage

Mode configuration should be selected based on the creative task:

  • Use mode 0 for general style transfer, enhancement, or prompt-guided transformation of existing images.
  • Use mode 1 for converting rough sketches into detailed images.
  • Use modes 2-4 for targeted region editing (inpainting), choosing the mask input method most convenient for the workflow.
  • Use mode 5 for automated batch processing of image directories or upload sets.

Theoretical Basis

The mode dispatch pattern follows a strategy architecture where the integer mode value selects both the image source and the mask computation strategy. Mathematically, the mode determines two variables:

Given mode m in {0, 1, 2, 3, 4, 5}:
  image(m) = source image selected by mode
  mask(m)  = None                         if m in {0, 1, 5}
           = binary_threshold(alpha)      if m == 2
           = pixel_diff(modified, orig)   if m == 3
           = uploaded_mask                if m == 4

For inpaint sketch mode (mode 3), the mask derivation involves a per-pixel comparison:

pred = any(image_pixels != original_pixels, axis=color_channel)
mask = pred * 255    (as uint8 grayscale)
mask = adjust_brightness(mask, 1 - alpha / 100)
blurred_mask = gaussian_blur(mask, radius=mask_blur)
composite_image = blend(blur(image, mask_blur), original, blurred_mask)

This produces a soft-edged mask that transitions smoothly between painted and unpainted regions, allowing the diffusion model to generate coherent boundaries.

The dispatch architecture ensures that downstream processing (VAE encoding, latent noise injection, denoising) receives a uniform (image, mask) pair regardless of input mode, enabling a single unified pipeline for all image-to-image workflows.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment