Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Huggingface Diffusers Dual LoRA Configuration

From Leeroopedia
Metadata
Knowledge Sources
Domains
Last Updated 2026-02-13 00:00 GMT

Overview

A design principle for configuring Low-Rank Adaptation (LoRA) adapters on both the UNet denoising network and the text encoder simultaneously. Dual LoRA configuration enables stronger concept binding during DreamBooth personalization by allowing the model to learn subject-specific representations in both the visual and textual pathways.

Description

Standard DreamBooth LoRA applies adapters only to the UNet's cross-attention layers, which modify how the denoising network responds to text conditioning. However, for challenging subjects -- especially faces, artistic styles, or abstract concepts -- training adapters on both the UNet and the text encoder yields significantly better results.

The dual LoRA configuration involves:

  • UNet LoRA -- Adapters are injected into the UNet's attention projection layers: to_k, to_q, to_v, to_out.0 (self-attention and cross-attention), plus add_k_proj and add_v_proj (added cross-attention projections in certain architectures).
  • Text encoder LoRA -- Adapters are injected into the text encoder's self-attention layers: q_proj, k_proj, v_proj, out_proj.

Each LoRA adapter introduces a pair of low-rank matrices A and B that modify the original weight matrix: W' = W + alpha/r * B @ A, where r is the rank and alpha is the scaling factor.

Usage

Configure dual LoRA when:

  • Fine-grained concept binding is needed -- the text encoder LoRA helps the model associate the identifier token with the subject's visual features more strongly.
  • Text encoder training is enabled via --train_text_encoder.
  • The subject has distinctive visual characteristics that benefit from modified text embeddings (e.g., specific faces, unique art styles).

When text encoder LoRA is not used, the text encoder remains fully frozen and only the UNet adapters are trained.

Theoretical Basis

Dual LoRA for DreamBooth extends the standard LoRA formulation to a multi-component adaptation setting:

SINGLE-COMPONENT LoRA (UNet only):
    W'_unet = W_unet + (alpha/r) * B_unet @ A_unet
    Trainable params: { A_unet, B_unet } for each target module

DUAL-COMPONENT LoRA (UNet + Text Encoder):
    W'_unet = W_unet + (alpha/r) * B_unet @ A_unet
    W'_text = W_text + (alpha/r) * B_text @ A_text
    Trainable params: { A_unet, B_unet, A_text, B_text } for each target module

TARGET MODULE SELECTION:
    UNet targets:         ["to_k", "to_q", "to_v", "to_out.0", "add_k_proj", "add_v_proj"]
    Text encoder targets: ["q_proj", "k_proj", "v_proj", "out_proj"]

PARAMETER COUNT (rank r=4):
    UNet LoRA:         ~1.6M trainable params  (out of ~860M total)
    Text encoder LoRA: ~0.3M trainable params  (out of ~123M total)
    Total:             ~1.9M trainable params  (~0.2% of full model)

Key theoretical properties:

  • Text encoder concept binding -- LoRA on the text encoder modifies how the identifier token is embedded, creating a more subject-specific text representation that propagates through cross-attention to the UNet.
  • Target module selection -- Only attention projection layers are targeted because these are the critical interaction points between text and visual features. Feed-forward and normalization layers are left frozen, as they encode more general transformations.
  • Rank and alpha coupling -- In the DreamBooth implementation, lora_alpha is set equal to rank, meaning the effective scaling factor is alpha/r = 1.0. This avoids the need for separate alpha tuning.
  • Gaussian initialization -- LoRA weights are initialized with init_lora_weights="gaussian", using small random values rather than the standard zero-initialization, which has been found to work better for DreamBooth personalization.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment