Principle:AUTOMATIC1111 Stable diffusion webui Face restoration

Knowledge Sources	Towards Robust Blind Face Restoration with Codebook Lookup Transformer (CodeFormer) GFP-GAN: Towards Real-World Blind Face Restoration with Generative Facial Prior RetinaFace: Single-stage Dense Face Localisation in the Wild
Domains	Face Restoration, Deep Learning, Computer Vision, Generative Models
Last Updated	2026-02-08 00:00 GMT

Overview

Face restoration is the process of detecting degraded faces within an image, enhancing each face individually using a specialized neural network, and compositing the restored faces back into the original image.

Description

AI-generated images, particularly from diffusion models at lower resolutions, frequently produce faces with artifacts: blurred features, asymmetric eyes, distorted teeth, or merged facial structures. Face restoration addresses this by applying specialized face enhancement models in a detect-crop-restore-paste pipeline.

The face restoration pipeline consists of three stages:

Face Detection: A face detection model (typically RetinaFace with a ResNet-50 backbone) locates all faces in the image and extracts facial landmarks (five-point: two eyes, nose tip, two mouth corners).
Face Alignment and Cropping: Using the detected landmarks, each face is aligned to a canonical orientation via affine transformation and cropped to a fixed size (typically 512x512 pixels). This normalization ensures the restoration model receives consistently formatted inputs.
Face Restoration and Pasting: Each cropped face is processed by the restoration network, then inverse-transformed and pasted back into the original image at its detected location with appropriate blending.

Two dominant approaches exist for the restoration step:

CodeFormer: Uses a Codebook Lookup Transformer architecture. It encodes the degraded face into discrete tokens by looking up entries in a learned codebook of high-quality face features, then decodes these tokens back to a restored face. A key parameter is the fidelity weight (w), which controls the tradeoff between quality and identity preservation. Lower values of w produce higher-quality faces but may deviate from the original identity; higher values preserve identity more faithfully but produce less enhancement.
GFPGAN (Generative Facial Prior GAN): Leverages pre-trained GAN priors (from a StyleGAN2 model) to provide rich facial texture generation. It uses a degradation removal module followed by a GAN-based generation module, with spatial feature transform layers that inject the generative prior at multiple scales.

Usage

Use face restoration when:

AI-generated images contain faces with visible artifacts (particularly at generation resolutions of 512x512 or below)
Upscaled images need face detail refinement after super-resolution processing
Batch processing a set of portraits or group photos that require consistent face quality
The quality-fidelity tradeoff needs to be tuned (CodeFormer's weight parameter allows per-use adjustment)

Theoretical Basis

The Detection-Restoration Pipeline

def restore_faces_pipeline(image, restore_fn):
    # Stage 1: Detect faces using RetinaFace
    face_helper = create_face_helper(device)
    face_helper.read_image(image_bgr)
    face_helper.get_face_landmarks_5(resize=640, eye_dist_threshold=5)

    # Stage 2: Align and crop each face to 512x512
    face_helper.align_warp_face()

    # Stage 3: Restore each cropped face
    for cropped_face in face_helper.cropped_faces:
        normalized = normalize(to_tensor(cropped_face), mean=0.5, std=0.5)
        restored = restore_fn(normalized)  # model-specific restoration
        face_helper.add_restored_face(to_numpy(restored))

    # Stage 4: Paste restored faces back
    face_helper.get_inverse_affine(None)
    result = face_helper.paste_faces_to_input_image()
    return result

CodeFormer: Codebook Lookup

CodeFormer's key innovation is discrete code prediction. The degraded face x is encoded into a sequence of code indices:

z = Encoder(x)              # Encode degraded face to feature map
codes = Transformer(z)       # Predict codebook indices via transformer
z_hat = Codebook[codes]      # Look up high-quality features from codebook
x_hat = Decoder(z_hat, x; w) # Decode with controllable fidelity weight w

The fidelity weight w in [0, 1] controls a blending between the codebook-predicted features (high quality, potentially different identity) and the encoder features from the input (lower quality but identity-preserving):

z_final = w * z_encoder + (1 - w) * z_codebook

Quality-Fidelity Tradeoff

The core tradeoff in face restoration is between perceptual quality and identity fidelity:

w = 0 (maximum quality): The model relies entirely on codebook features, producing the sharpest and most detailed restoration but potentially altering the face's identity.
w = 1 (maximum fidelity): The model relies entirely on the encoder's representation of the input face, preserving identity at the cost of retaining more degradation artifacts.
w = 0.5 (default): A balanced setting that typically produces good quality while maintaining recognizable identity.

GFPGAN does not expose an equivalent continuous control; instead, its visibility parameter in the WebUI controls a simple alpha blend between the restored and original face in pixel space.

Related Pages

Implemented By

Implementation:AUTOMATIC1111_Stable_diffusion_webui_Restore_faces

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment