Principle:AUTOMATIC1111 Stable diffusion webui Img2img parameter configuration
| Knowledge Sources | |
|---|---|
| Domains | Image Generation, Image Editing, Inpainting, Diffusion Models |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Img2img parameter configuration describes the set of parameters that control image-conditioned diffusion generation, including denoising strength as a noise-to-signal ratio, inpainting fill strategies, and mask processing behavior.
Description
When generating images conditioned on an existing source image, the parameter space extends beyond what text-to-image requires. The core additional parameters are:
Denoising Strength is the most critical parameter. It controls the fraction of diffusion timesteps that are executed, which directly determines how much noise is added to the encoded source image before denoising begins. A value of 0.0 means no noise is added and the output is essentially the input; a value of 1.0 means the full noise schedule is applied and the source image has minimal influence on the output.
Init Images provides the list of source images to condition on. For batch operations, each image in the list is processed independently.
Mask and Mask Parameters define which regions of the image should be regenerated:
mask/image_mask: The grayscale or binary mask indicating regions to inpaint.mask_blur_x/mask_blur_y: Gaussian blur radii for softening mask edges, preventing harsh transitions.mask_round: Whether to binarize the mask (round to 0 or 255) or preserve soft gradients.inpainting_mask_invert: Whether to swap masked and unmasked regions.
Inpainting Fill Modes determine what content fills the masked area before denoising:
- Fill (0): The masked region is filled using a content-aware fill algorithm from the surrounding pixels.
- Original (1): The original pixel content is preserved in the masked area, relying on denoising to transform it.
- Latent Noise (2): The masked region in latent space is replaced with random noise, giving the model maximum creative freedom.
- Latent Nothing (3): The masked region in latent space is zeroed out.
Resize Mode controls how the source image is fitted to the target dimensions:
- 0: Just resize (may distort aspect ratio)
- 1: Crop and resize (preserves aspect ratio, crops excess)
- 2: Resize and fill (preserves aspect ratio, pads with background)
- 3: Latent upscale (resize happens in latent space via interpolation)
Usage
- Set denoising_strength low (0.2-0.4) for subtle modifications that preserve the source composition.
- Set denoising_strength high (0.7-1.0) for creative transformations guided primarily by the text prompt.
- For inpainting, choose the fill mode based on desired behavior: use fill (0) for seamless blending, original (1) when the source content should inform the regeneration, or latent noise (2) for entirely new content in the masked area.
- Always set
mask_blurto a positive value (typically 4-8) for inpainting to avoid hard seams at mask boundaries.
Theoretical Basis
The denoising strength parameter maps directly to the noise schedule of the diffusion model. Given a total of T timesteps and denoising strength s:
t_start = floor(T * s)
The source image x_0 is encoded to latent z_0, then noise is added corresponding to timestep t_start:
z_noisy = sqrt(alpha_bar_t) * z_0 + sqrt(1 - alpha_bar_t) * epsilon
where epsilon ~ N(0, I) and alpha_bar_t is the cumulative noise schedule at t_start
Denoising then proceeds from t_start back to 0, guided by the text conditioning. At s=0 no noise is added (z_noisy = z_0), at s=1.0 the full schedule is traversed, effectively ignoring the source.
For inpainting fill modes, the behavior in latent space is:
Fill (0): z_0 = encode(content_aware_fill(image, mask))
Original (1): z_0 = encode(image) # mask area kept as-is
Latent Noise (2): z_0 = z_0 * mask + random_noise * nmask
Latent Nothing (3): z_0 = z_0 * mask # zeros in masked area
Where mask represents the unmasked region (value 1.0) and nmask represents the masked region (value 1.0).