Principle:Huggingface Diffusers Model Freezing

**Metadata**
Knowledge Sources	Huggingface Diffusers Transfer Learning
Domains	Generative_AI Transfer_Learning Parameter_Efficient_Fine_Tuning
Last Updated	2026-02-13 00:00 GMT

Overview

A design principle for loading pretrained diffusion model components and freezing their weights before injecting parameter-efficient adapters. Model freezing is the prerequisite step that enables LoRA-based fine-tuning by ensuring that only the newly added adapter parameters receive gradients during training.

Description

In DreamBooth LoRA training, the full diffusion pipeline consists of several large pretrained components:

UNet2DConditionModel -- The denoising network (typically hundreds of millions of parameters).
AutoencoderKL (VAE) -- The variational autoencoder for encoding/decoding between pixel and latent space.
Text encoder -- The CLIP or similar text encoder that converts prompts to embeddings.
DDPMScheduler -- The noise scheduler (no trainable parameters).

The model freezing principle dictates that all pretrained weights are loaded in evaluation mode and their gradients are disabled via requires_grad_(False) before any adapter layers are added. This achieves several goals:

Memory efficiency -- Frozen parameters do not store gradient buffers, reducing GPU memory usage by approximately 50%.
Training stability -- Only the small set of adapter parameters are updated, preventing catastrophic changes to the pretrained weights.
Mixed precision compatibility -- Frozen weights can be cast to half-precision (fp16/bf16) for inference, while adapter weights remain in full precision for training stability.

Usage

Apply model freezing immediately after loading pretrained components and before adding LoRA adapters:

Load all model components with from_pretrained().
Freeze all parameters with model.requires_grad_(False).
Cast frozen models to the inference dtype (fp16/bf16).
Move all models to the target device.
Then add LoRA adapters (which will have requires_grad=True by default).

Theoretical Basis

Model freezing is the foundation of transfer learning and parameter-efficient fine-tuning (PEFT). The core insight is that a model pretrained on a large dataset has already learned rich feature representations, and adapting it to a new task requires modifying only a small subset of parameters.

FREEZE-THEN-ADAPT:
    theta_pretrained = load_pretrained(model_id)

    For all p in theta_pretrained:
        p.requires_grad = False       # Freeze base model

    theta_adapter = initialize_adapter(theta_pretrained)
    # Only theta_adapter receives gradients

    theta_total = theta_pretrained + theta_adapter
    # Forward pass uses both; backward pass updates only theta_adapter

MEMORY ANALYSIS:
    Base model:    ~860M params (UNet) + ~123M (text_encoder) + ~83M (VAE)
    LoRA adapters: ~1-4M params (rank 4, targeting attention layers)
    Gradient memory: proportional to |theta_adapter| only

Key theoretical properties:

Gradient flow control -- requires_grad_(False) prevents gradient computation and storage for frozen parameters, but the frozen weights still participate in the forward pass. Gradients flow through frozen layers to reach adapter parameters via the chain rule.
Weight preservation -- Frozen weights remain at their pretrained values throughout training, preserving the model's general capabilities.
Dtype separation -- Frozen weights can use reduced precision (fp16/bf16) since they only participate in forward passes, while trainable adapter weights maintain full precision (fp32) for numerical stability during gradient updates.

Related Pages

Implementation:Huggingface_Diffusers_ModelMixin_From_Pretrained_Frozen

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment