Workflow:AUTOMATIC1111 Stable diffusion webui Image to image generation

Knowledge Sources	Stable Diffusion WebUI Stable Diffusion WebUI Wiki
Domains	Image_Generation, Stable_Diffusion, Inpainting, Generative_AI
Last Updated	2026-02-08 08:00 GMT

Overview

End-to-end process for transforming existing images using Stable Diffusion with support for standard img2img, sketch input, inpainting, and batch processing modes.

Description

This workflow covers the image-to-image generation pipeline that takes an existing image as input and transforms it guided by a text prompt. Unlike txt2img which starts from pure noise, img2img encodes the input image into latent space and adds noise proportional to the denoising strength, then denoises back to produce a modified version. The workflow supports five distinct input modes: standard img2img, sketch-based generation, inpainting with a drawn mask, inpaint sketch with automatic change detection, and mask upload. A batch processing mode handles multiple images from a directory with optional per-image masks.

Usage

Execute this workflow when you have an existing image that you want to modify, enhance, or selectively edit using Stable Diffusion. Use inpainting mode when you need to regenerate specific regions of an image while preserving the rest. Use batch mode when processing multiple images with consistent settings.

Execution Steps

Step 1: Input image selection and mode configuration

Select the input mode from the five available options: standard img2img (upload a reference image), sketch (draw on a canvas), inpaint (upload image and draw a mask to specify regions to regenerate), inpaint sketch (the system detects changed pixels as the mask), or upload mask (provide a separate mask image). For batch processing, specify an input directory and optionally a mask directory. Each mode determines how the initial image and optional mask are prepared.

Key considerations:

Standard img2img works best for global style transfer and variations
Inpainting requires a binary mask where white areas are regenerated
Batch mode can extract generation parameters from PNG metadata for per-image settings
The resize mode controls how input images are scaled to match the target dimensions

Step 2: Prompt and parameter configuration

Configure the text prompt, negative prompt, and generation parameters as in txt2img. The critical additional parameter is denoising strength (0.0 to 1.0), which controls how much the output differs from the input. Lower values preserve more of the original image; higher values allow more creative freedom. For inpainting, configure the masked content fill mode (fill, original, latent noise, latent nothing) and mask blur radius.

Key considerations:

Denoising strength of 0.0 returns the original image; 1.0 is equivalent to txt2img
Mask blur softens the boundary between inpainted and preserved regions
Inpaint at full resolution crops and processes only the masked region for higher detail
The padding parameter controls how much context around the masked region is included

Step 3: Image preprocessing and latent encoding

Resize the input image to the target dimensions using the selected resize mode (just resize, crop and resize, resize and fill, or latent upscale). For inpainting, process the mask by applying blur, creating a binary threshold, and optionally inverting it. Encode the preprocessed image through the VAE encoder into latent space. For batch mode, iterate through the input directory loading each image and its corresponding mask.

Key considerations:

Crop and resize preserves aspect ratio by center-cropping
Resize and fill preserves aspect ratio by padding with blurred content
The latent representation is 8x smaller in each spatial dimension than the pixel image
Color correction can be applied to match the output color distribution to the input

Step 4: Noise addition and guided denoising

Add noise to the encoded latent based on the denoising strength and sampling schedule. The amount of noise determines the starting point on the diffusion timeline: lower denoising strength starts closer to the clean image, higher starts closer to pure noise. Execute the sampling loop with CFG guidance from the encoded prompts. For inpainting, the denoiser blends the noised original latent with the denoised prediction at each step according to the mask, preserving unmasked regions.

Key considerations:

The effective number of steps is reduced proportionally to the denoising strength
Masked regions receive full denoising while unmasked areas maintain the original content
Soft inpainting (optional extension) provides gradual blending at mask boundaries
The same sampler and scheduler options from txt2img are available

Step 5: VAE decoding and output composition

Decode the final denoised latent through the VAE decoder to produce the output pixel image. For inpainting, composite the generated content into the original image using the mask, with optional overlay blending at the boundaries. Apply optional post-processing (face restoration, color correction). Save the output with generation metadata embedded in the PNG. For batch processing, repeat the entire process for each input image.

Key considerations:

The overlay step ensures seamless blending between inpainted and original regions
Face restoration can be applied selectively to the output
Batch processing preserves individual image metadata and supports custom output directories
Generation info includes the denoising strength and mask parameters for reproducibility

Execution Diagram

GitHub URL

Workflow Repository