Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:AUTOMATIC1111 Stable diffusion webui High resolution fix

From Leeroopedia


Knowledge Sources
Domains Diffusion Models, Image Upscaling, Multi-Pass Generation
Last Updated 2026-02-08 00:00 GMT

Overview

High-resolution fix is a two-pass generation technique that produces high-resolution images by first generating at native model resolution, then upscaling and denoising at the target resolution to avoid the compositional artifacts caused by direct high-resolution generation.

Description

Stable Diffusion models (particularly SD1.x) are trained at a specific resolution, typically 512x512 pixels. When generating directly at significantly higher resolutions (e.g., 1024x1024 or 1024x1536), the model tends to produce characteristic artifacts:

  • Duplicate subjects -- The model may generate two or more copies of the main subject, as if tiling
  • Anatomical distortions -- Limbs, faces, and body proportions become severely distorted
  • Compositional incoherence -- The overall scene layout breaks down

These artifacts occur because the model's UNet was trained with a fixed receptive field relative to the training resolution. At higher resolutions, the same receptive field covers a proportionally smaller area of the image, causing the model to treat different regions as separate compositions.

The high-resolution fix (hires fix) solves this by splitting generation into two passes:

  1. First pass (composition) -- Generate at or near the model's native resolution (e.g., 512x512) to establish a coherent composition
  2. Upscale -- Scale the first-pass result to the target high resolution using either latent-space interpolation or a pixel-space upscaler
  3. Second pass (refinement) -- Denoise the upscaled result at a reduced denoising strength (typically 0.4-0.7) to add fine detail while preserving the established composition

Usage

Hires fix is used whenever the desired output resolution significantly exceeds the model's native training resolution. Common scenarios include:

  • Generating wallpaper-sized images (1920x1080 or higher) from SD1.x models
  • Creating detailed portraits at resolutions suitable for printing
  • Producing images with fine detail that would be lost at 512x512

The technique is a trade-off: it roughly doubles the generation time but dramatically improves quality at high resolutions.

Theoretical Basis

Why Direct High-Resolution Fails

The UNet's convolutional layers and attention mechanisms have effective receptive fields calibrated to the training resolution. At resolution R_train, the deepest layers of the UNet can "see" the entire image. At resolution R > R_train, the same layers only see a portion:

Effective coverage = R_train / R

At 2x resolution: each UNet pass "sees" approximately 1/4 of the image area
At 3x resolution: each UNet pass "sees" approximately 1/9 of the image area

This causes the model to generate independent compositions in different spatial regions.

Two-Pass Denoising

The hires fix leverages the img2img principle (SDEdit): given an upscaled image with correct global composition but lacking fine detail, partial denoising can add detail while preserving structure.

The denoising strength parameter controls this trade-off:

denoising_strength = 0.0  -> No change (output = upscaled input)
denoising_strength = 0.5  -> Add significant detail, mostly preserve composition
denoising_strength = 1.0  -> Full re-generation (composition may change)

The second pass starts from a noise level corresponding to denoising_strength on the noise schedule, skipping the early high-noise steps.

Upscaling Methods

The intermediate upscaling can use:

  • Latent space upscalers -- Nearest-neighbor, bilinear, or bicubic interpolation directly in latent space (fast, no decoding/re-encoding needed)
  • Pixel space upscalers -- Decode to pixels, upscale with a neural upscaler (ESRGAN, SwinIR, etc.) or traditional algorithm (Lanczos), then re-encode to latent space (higher quality, more compute)

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment