Principle:AUTOMATIC1111 Stable diffusion webui Face restoration
| Knowledge Sources | |
|---|---|
| Domains | Face Restoration, Deep Learning, Computer Vision, Generative Models |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Face restoration is the process of detecting degraded faces within an image, enhancing each face individually using a specialized neural network, and compositing the restored faces back into the original image.
Description
AI-generated images, particularly from diffusion models at lower resolutions, frequently produce faces with artifacts: blurred features, asymmetric eyes, distorted teeth, or merged facial structures. Face restoration addresses this by applying specialized face enhancement models in a detect-crop-restore-paste pipeline.
The face restoration pipeline consists of three stages:
- Face Detection: A face detection model (typically RetinaFace with a ResNet-50 backbone) locates all faces in the image and extracts facial landmarks (five-point: two eyes, nose tip, two mouth corners).
- Face Alignment and Cropping: Using the detected landmarks, each face is aligned to a canonical orientation via affine transformation and cropped to a fixed size (typically 512x512 pixels). This normalization ensures the restoration model receives consistently formatted inputs.
- Face Restoration and Pasting: Each cropped face is processed by the restoration network, then inverse-transformed and pasted back into the original image at its detected location with appropriate blending.
Two dominant approaches exist for the restoration step:
- CodeFormer: Uses a Codebook Lookup Transformer architecture. It encodes the degraded face into discrete tokens by looking up entries in a learned codebook of high-quality face features, then decodes these tokens back to a restored face. A key parameter is the fidelity weight (w), which controls the tradeoff between quality and identity preservation. Lower values of w produce higher-quality faces but may deviate from the original identity; higher values preserve identity more faithfully but produce less enhancement.
- GFPGAN (Generative Facial Prior GAN): Leverages pre-trained GAN priors (from a StyleGAN2 model) to provide rich facial texture generation. It uses a degradation removal module followed by a GAN-based generation module, with spatial feature transform layers that inject the generative prior at multiple scales.
Usage
Use face restoration when:
- AI-generated images contain faces with visible artifacts (particularly at generation resolutions of 512x512 or below)
- Upscaled images need face detail refinement after super-resolution processing
- Batch processing a set of portraits or group photos that require consistent face quality
- The quality-fidelity tradeoff needs to be tuned (CodeFormer's weight parameter allows per-use adjustment)
Theoretical Basis
The Detection-Restoration Pipeline
def restore_faces_pipeline(image, restore_fn):
# Stage 1: Detect faces using RetinaFace
face_helper = create_face_helper(device)
face_helper.read_image(image_bgr)
face_helper.get_face_landmarks_5(resize=640, eye_dist_threshold=5)
# Stage 2: Align and crop each face to 512x512
face_helper.align_warp_face()
# Stage 3: Restore each cropped face
for cropped_face in face_helper.cropped_faces:
normalized = normalize(to_tensor(cropped_face), mean=0.5, std=0.5)
restored = restore_fn(normalized) # model-specific restoration
face_helper.add_restored_face(to_numpy(restored))
# Stage 4: Paste restored faces back
face_helper.get_inverse_affine(None)
result = face_helper.paste_faces_to_input_image()
return result
CodeFormer: Codebook Lookup
CodeFormer's key innovation is discrete code prediction. The degraded face x is encoded into a sequence of code indices:
z = Encoder(x) # Encode degraded face to feature map
codes = Transformer(z) # Predict codebook indices via transformer
z_hat = Codebook[codes] # Look up high-quality features from codebook
x_hat = Decoder(z_hat, x; w) # Decode with controllable fidelity weight w
The fidelity weight w in [0, 1] controls a blending between the codebook-predicted features (high quality, potentially different identity) and the encoder features from the input (lower quality but identity-preserving):
z_final = w * z_encoder + (1 - w) * z_codebook
Quality-Fidelity Tradeoff
The core tradeoff in face restoration is between perceptual quality and identity fidelity:
- w = 0 (maximum quality): The model relies entirely on codebook features, producing the sharpest and most detailed restoration but potentially altering the face's identity.
- w = 1 (maximum fidelity): The model relies entirely on the encoder's representation of the input face, preserving identity at the cost of retaining more degradation artifacts.
- w = 0.5 (default): A balanced setting that typically produces good quality while maintaining recognizable identity.
GFPGAN does not expose an equivalent continuous control; instead, its visibility parameter in the WebUI controls a simple alpha blend between the restored and original face in pixel space.