Implementation:Huggingface Peft Diffusion PEFT Adapter
Metadata
- Source: examples/lora_dreambooth/train_dreambooth.py:L719-744, examples/stable_diffusion/train_dreambooth.py:L88-165
- Repository: huggingface/peft
- Type: API Doc / Pattern Doc (hybrid)
- Domains: Computer_Vision, Diffusion_Models
Overview
This implementation documents the pattern of applying PEFT adapters (LoRA, LoHa, LoKr) to the UNet and text encoder components of a Stable Diffusion pipeline for DreamBooth personalization. The pattern involves defining target modules for each component, creating the appropriate adapter configuration, and wrapping the model with get_peft_model.
Two example scripts demonstrate this pattern:
examples/lora_dreambooth/train_dreambooth.py-- LoRA-only DreamBooth with UNet and optional text encoder adaptationexamples/stable_diffusion/train_dreambooth.py-- Multi-adapter DreamBooth supporting LoRA, LoHa, and LoKr
Target Module Definitions
The target modules identify which layers inside the UNet and text encoder receive adapter injections.
UNet Target Modules
From examples/stable_diffusion/train_dreambooth.py (comprehensive set):
UNET_TARGET_MODULES = [
"to_q",
"to_k",
"to_v",
"proj",
"proj_in",
"proj_out",
"conv",
"conv1",
"conv2",
"conv_shortcut",
"to_out.0",
"time_emb_proj",
"ff.net.2",
]
From examples/lora_dreambooth/train_dreambooth.py (minimal set):
UNET_TARGET_MODULES = ["to_q", "to_v", "query", "value"]
Text Encoder Target Modules
From examples/stable_diffusion/train_dreambooth.py (comprehensive set):
TEXT_ENCODER_TARGET_MODULES = ["fc1", "fc2", "q_proj", "k_proj", "v_proj", "out_proj"]
From examples/lora_dreambooth/train_dreambooth.py (minimal set):
TEXT_ENCODER_TARGET_MODULES = ["q_proj", "v_proj"]
API: Adapter Configuration and Model Wrapping
LoRA Configuration (lora_dreambooth example)
from peft import LoraConfig, get_peft_model
# UNet adapter
config = LoraConfig(
r=args.lora_r,
lora_alpha=args.lora_alpha,
target_modules=UNET_TARGET_MODULES,
lora_dropout=args.lora_dropout,
bias=args.lora_bias,
)
unet = get_peft_model(unet, config)
unet.print_trainable_parameters()
# Optional text encoder adapter
config = LoraConfig(
r=args.lora_text_encoder_r,
lora_alpha=args.lora_text_encoder_alpha,
target_modules=TEXT_ENCODER_TARGET_MODULES,
lora_dropout=args.lora_text_encoder_dropout,
bias=args.lora_text_encoder_bias,
)
text_encoder = get_peft_model(text_encoder, config)
text_encoder.print_trainable_parameters()
Multi-Adapter Configuration (stable_diffusion example)
The examples/stable_diffusion/train_dreambooth.py script supports three adapter types through factory functions:
from peft import LoraConfig, LoHaConfig, LoKrConfig, get_peft_model
def create_unet_adapter_config(args):
if args.adapter == "lora":
config = LoraConfig(
r=args.unet_r,
lora_alpha=args.unet_alpha,
target_modules=UNET_TARGET_MODULES,
lora_dropout=args.unet_dropout,
bias=args.unet_bias,
init_lora_weights=True,
)
elif args.adapter == "loha":
config = LoHaConfig(
r=args.unet_r,
alpha=args.unet_alpha,
target_modules=UNET_TARGET_MODULES,
rank_dropout=args.unet_rank_dropout,
module_dropout=args.unet_module_dropout,
use_effective_conv2d=args.unet_use_effective_conv2d,
init_weights=True,
)
elif args.adapter == "lokr":
config = LoKrConfig(
r=args.unet_r,
alpha=args.unet_alpha,
target_modules=UNET_TARGET_MODULES,
rank_dropout=args.unet_rank_dropout,
module_dropout=args.unet_module_dropout,
use_effective_conv2d=args.unet_use_effective_conv2d,
decompose_both=args.unet_decompose_both,
decompose_factor=args.unet_decompose_factor,
init_weights=True,
)
return config
The text encoder configuration follows the same pattern via create_text_encoder_adapter_config, substituting TEXT_ENCODER_TARGET_MODULES and text-encoder-specific hyperparameters.
Usage Pattern
The complete pattern for applying PEFT to a diffusion pipeline follows these steps:
from diffusers import UNet2DConditionModel, AutoencoderKL, DDPMScheduler
from transformers import CLIPTextModel
from peft import LoraConfig, get_peft_model
# 1. Load pretrained diffusion components
noise_scheduler = DDPMScheduler(beta_start=0.00085, beta_end=0.012,
beta_schedule="scaled_linear",
num_train_timesteps=1000)
text_encoder = CLIPTextModel.from_pretrained(model_path, subfolder="text_encoder")
vae = AutoencoderKL.from_pretrained(model_path, subfolder="vae")
unet = UNet2DConditionModel.from_pretrained(model_path, subfolder="unet")
# 2. Freeze VAE (never trained)
vae.requires_grad_(False)
# 3. Apply PEFT adapter to UNet
unet_config = LoraConfig(r=8, lora_alpha=32, target_modules=UNET_TARGET_MODULES)
unet = get_peft_model(unet, unet_config)
# 4. Optionally apply PEFT adapter to text encoder
text_encoder.requires_grad_(False) # freeze if not training text encoder
# Or apply adapter:
te_config = LoraConfig(r=8, lora_alpha=32, target_modules=TEXT_ENCODER_TARGET_MODULES)
text_encoder = get_peft_model(text_encoder, te_config)
# 5. Train using standard DreamBooth loop with prior preservation
Key Parameters
| Parameter | Description | Typical Values |
|---|---|---|
r |
Adapter rank (controls capacity) | 4, 8, 16, 32, 64 |
lora_alpha / alpha |
Scaling factor for adapter contribution | 16, 32 |
target_modules |
List of module names to inject adapters into | See definitions above |
lora_dropout |
Dropout applied to adapter layers | 0.0, 0.05, 0.1 |
bias |
Whether to train bias terms | "none", "all", "lora_only" |
use_effective_conv2d |
LoHa/LoKr: use efficient Conv2d decomposition | True, False |
decompose_both |
LoKr: decompose both factors | True, False |
Design Decisions
- Separate configs for UNet and text encoder: Each component may use different ranks, alphas, and target modules optimized for its architecture
- VAE is never adapted: The variational autoencoder is frozen in all examples since it only encodes/decodes pixel space and does not participate in the diffusion denoising process
- Gradient checkpointing caveat: When using LoRA with the text encoder, gradient checkpointing on the text encoder is disabled to avoid compatibility issues (see
train_dreambooth.py:L756-758)