Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Huggingface Diffusers Model Freezing

From Leeroopedia
Metadata
Knowledge Sources
Domains
Last Updated 2026-02-13 00:00 GMT

Overview

A design principle for loading pretrained diffusion model components and freezing their weights before injecting parameter-efficient adapters. Model freezing is the prerequisite step that enables LoRA-based fine-tuning by ensuring that only the newly added adapter parameters receive gradients during training.

Description

In DreamBooth LoRA training, the full diffusion pipeline consists of several large pretrained components:

  • UNet2DConditionModel -- The denoising network (typically hundreds of millions of parameters).
  • AutoencoderKL (VAE) -- The variational autoencoder for encoding/decoding between pixel and latent space.
  • Text encoder -- The CLIP or similar text encoder that converts prompts to embeddings.
  • DDPMScheduler -- The noise scheduler (no trainable parameters).

The model freezing principle dictates that all pretrained weights are loaded in evaluation mode and their gradients are disabled via requires_grad_(False) before any adapter layers are added. This achieves several goals:

  • Memory efficiency -- Frozen parameters do not store gradient buffers, reducing GPU memory usage by approximately 50%.
  • Training stability -- Only the small set of adapter parameters are updated, preventing catastrophic changes to the pretrained weights.
  • Mixed precision compatibility -- Frozen weights can be cast to half-precision (fp16/bf16) for inference, while adapter weights remain in full precision for training stability.

Usage

Apply model freezing immediately after loading pretrained components and before adding LoRA adapters:

  1. Load all model components with from_pretrained().
  2. Freeze all parameters with model.requires_grad_(False).
  3. Cast frozen models to the inference dtype (fp16/bf16).
  4. Move all models to the target device.
  5. Then add LoRA adapters (which will have requires_grad=True by default).

Theoretical Basis

Model freezing is the foundation of transfer learning and parameter-efficient fine-tuning (PEFT). The core insight is that a model pretrained on a large dataset has already learned rich feature representations, and adapting it to a new task requires modifying only a small subset of parameters.

FREEZE-THEN-ADAPT:
    theta_pretrained = load_pretrained(model_id)

    For all p in theta_pretrained:
        p.requires_grad = False       # Freeze base model

    theta_adapter = initialize_adapter(theta_pretrained)
    # Only theta_adapter receives gradients

    theta_total = theta_pretrained + theta_adapter
    # Forward pass uses both; backward pass updates only theta_adapter

MEMORY ANALYSIS:
    Base model:    ~860M params (UNet) + ~123M (text_encoder) + ~83M (VAE)
    LoRA adapters: ~1-4M params (rank 4, targeting attention layers)
    Gradient memory: proportional to |theta_adapter| only

Key theoretical properties:

  • Gradient flow control -- requires_grad_(False) prevents gradient computation and storage for frozen parameters, but the frozen weights still participate in the forward pass. Gradients flow through frozen layers to reach adapter parameters via the chain rule.
  • Weight preservation -- Frozen weights remain at their pretrained values throughout training, preserving the model's general capabilities.
  • Dtype separation -- Frozen weights can use reduced precision (fp16/bf16) since they only participate in forward passes, while trainable adapter weights maintain full precision (fp32) for numerical stability during gradient updates.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment