Implementation:Huggingface Peft Diffusion PEFT Adapter

Metadata

Source: examples/lora_dreambooth/train_dreambooth.py:L719-744, examples/stable_diffusion/train_dreambooth.py:L88-165
Repository: huggingface/peft
Type: API Doc / Pattern Doc (hybrid)
Domains: Computer_Vision, Diffusion_Models

Overview

This implementation documents the pattern of applying PEFT adapters (LoRA, LoHa, LoKr) to the UNet and text encoder components of a Stable Diffusion pipeline for DreamBooth personalization. The pattern involves defining target modules for each component, creating the appropriate adapter configuration, and wrapping the model with get_peft_model.

Two example scripts demonstrate this pattern:

examples/lora_dreambooth/train_dreambooth.py -- LoRA-only DreamBooth with UNet and optional text encoder adaptation
examples/stable_diffusion/train_dreambooth.py -- Multi-adapter DreamBooth supporting LoRA, LoHa, and LoKr

Target Module Definitions

The target modules identify which layers inside the UNet and text encoder receive adapter injections.

UNet Target Modules

From examples/stable_diffusion/train_dreambooth.py (comprehensive set):

UNET_TARGET_MODULES = [
    "to_q",
    "to_k",
    "to_v",
    "proj",
    "proj_in",
    "proj_out",
    "conv",
    "conv1",
    "conv2",
    "conv_shortcut",
    "to_out.0",
    "time_emb_proj",
    "ff.net.2",
]

From examples/lora_dreambooth/train_dreambooth.py (minimal set):

UNET_TARGET_MODULES = ["to_q", "to_v", "query", "value"]

Text Encoder Target Modules

From examples/stable_diffusion/train_dreambooth.py (comprehensive set):

TEXT_ENCODER_TARGET_MODULES = ["fc1", "fc2", "q_proj", "k_proj", "v_proj", "out_proj"]

From examples/lora_dreambooth/train_dreambooth.py (minimal set):

TEXT_ENCODER_TARGET_MODULES = ["q_proj", "v_proj"]

API: Adapter Configuration and Model Wrapping

LoRA Configuration (lora_dreambooth example)

from peft import LoraConfig, get_peft_model

# UNet adapter
config = LoraConfig(
    r=args.lora_r,
    lora_alpha=args.lora_alpha,
    target_modules=UNET_TARGET_MODULES,
    lora_dropout=args.lora_dropout,
    bias=args.lora_bias,
)
unet = get_peft_model(unet, config)
unet.print_trainable_parameters()

# Optional text encoder adapter
config = LoraConfig(
    r=args.lora_text_encoder_r,
    lora_alpha=args.lora_text_encoder_alpha,
    target_modules=TEXT_ENCODER_TARGET_MODULES,
    lora_dropout=args.lora_text_encoder_dropout,
    bias=args.lora_text_encoder_bias,
)
text_encoder = get_peft_model(text_encoder, config)
text_encoder.print_trainable_parameters()

Multi-Adapter Configuration (stable_diffusion example)

The examples/stable_diffusion/train_dreambooth.py script supports three adapter types through factory functions:

from peft import LoraConfig, LoHaConfig, LoKrConfig, get_peft_model

def create_unet_adapter_config(args):
    if args.adapter == "lora":
        config = LoraConfig(
            r=args.unet_r,
            lora_alpha=args.unet_alpha,
            target_modules=UNET_TARGET_MODULES,
            lora_dropout=args.unet_dropout,
            bias=args.unet_bias,
            init_lora_weights=True,
        )
    elif args.adapter == "loha":
        config = LoHaConfig(
            r=args.unet_r,
            alpha=args.unet_alpha,
            target_modules=UNET_TARGET_MODULES,
            rank_dropout=args.unet_rank_dropout,
            module_dropout=args.unet_module_dropout,
            use_effective_conv2d=args.unet_use_effective_conv2d,
            init_weights=True,
        )
    elif args.adapter == "lokr":
        config = LoKrConfig(
            r=args.unet_r,
            alpha=args.unet_alpha,
            target_modules=UNET_TARGET_MODULES,
            rank_dropout=args.unet_rank_dropout,
            module_dropout=args.unet_module_dropout,
            use_effective_conv2d=args.unet_use_effective_conv2d,
            decompose_both=args.unet_decompose_both,
            decompose_factor=args.unet_decompose_factor,
            init_weights=True,
        )
    return config

The text encoder configuration follows the same pattern via create_text_encoder_adapter_config, substituting TEXT_ENCODER_TARGET_MODULES and text-encoder-specific hyperparameters.

Usage Pattern

The complete pattern for applying PEFT to a diffusion pipeline follows these steps:

from diffusers import UNet2DConditionModel, AutoencoderKL, DDPMScheduler
from transformers import CLIPTextModel
from peft import LoraConfig, get_peft_model

# 1. Load pretrained diffusion components
noise_scheduler = DDPMScheduler(beta_start=0.00085, beta_end=0.012,
                                 beta_schedule="scaled_linear",
                                 num_train_timesteps=1000)
text_encoder = CLIPTextModel.from_pretrained(model_path, subfolder="text_encoder")
vae = AutoencoderKL.from_pretrained(model_path, subfolder="vae")
unet = UNet2DConditionModel.from_pretrained(model_path, subfolder="unet")

# 2. Freeze VAE (never trained)
vae.requires_grad_(False)

# 3. Apply PEFT adapter to UNet
unet_config = LoraConfig(r=8, lora_alpha=32, target_modules=UNET_TARGET_MODULES)
unet = get_peft_model(unet, unet_config)

# 4. Optionally apply PEFT adapter to text encoder
text_encoder.requires_grad_(False)  # freeze if not training text encoder
# Or apply adapter:
te_config = LoraConfig(r=8, lora_alpha=32, target_modules=TEXT_ENCODER_TARGET_MODULES)
text_encoder = get_peft_model(text_encoder, te_config)

# 5. Train using standard DreamBooth loop with prior preservation

Key Parameters

Parameter	Description	Typical Values
`r`	Adapter rank (controls capacity)	4, 8, 16, 32, 64
`lora_alpha` / `alpha`	Scaling factor for adapter contribution	16, 32
`target_modules`	List of module names to inject adapters into	See definitions above
`lora_dropout`	Dropout applied to adapter layers	0.0, 0.05, 0.1
`bias`	Whether to train bias terms	"none", "all", "lora_only"
`use_effective_conv2d`	LoHa/LoKr: use efficient Conv2d decomposition	True, False
`decompose_both`	LoKr: decompose both factors	True, False

Design Decisions

Separate configs for UNet and text encoder: Each component may use different ranks, alphas, and target modules optimized for its architecture
VAE is never adapted: The variational autoencoder is frozen in all examples since it only encodes/decodes pixel space and does not participate in the diffusion denoising process
Gradient checkpointing caveat: When using LoRA with the text encoder, gradient checkpointing on the text encoder is disabled to avoid compatibility issues (see train_dreambooth.py:L756-758)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment