Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Huggingface Diffusers Prior Preservation Training

From Leeroopedia
Metadata
Knowledge Sources
Domains
Last Updated 2026-02-13 00:00 GMT

Overview

A training principle that combines instance-specific denoising loss with prior preservation regularization loss in a single training loop. This dual-objective formulation is the core of DreamBooth training, enabling subject personalization while preventing catastrophic forgetting of the model's general capabilities.

Description

The DreamBooth training loop extends the standard diffusion model training objective with a prior preservation term. In each training step, the loop processes a batch that contains both instance (subject) images and class-prior images concatenated together. The key steps are:

  1. Latent encoding -- Pixel values are encoded to the latent space via the frozen VAE encoder.
  2. Noise sampling -- Random Gaussian noise is sampled and added to the latents at a random timestep according to the noise schedule.
  3. Noise prediction -- The UNet predicts the noise (or velocity, depending on prediction_type) from the noisy latents and text conditioning.
  4. Loss decomposition -- When prior preservation is enabled, the model prediction and target are split into instance and class halves using torch.chunk().
  5. Dual loss computation -- Instance loss and prior loss are computed separately as MSE losses, then combined: loss = loss_instance + lambda * loss_prior.
  6. Gradient update -- Gradients are computed, clipped, and applied only to the trainable LoRA parameters.

Without prior preservation, the loop simplifies to the standard denoising objective computed only on instance images.

Usage

The training loop is configured by several key parameters:

  • with_prior_preservation -- Enables the dual-objective formulation.
  • prior_loss_weight (lambda) -- Controls the relative importance of the prior loss. Default 1.0.
  • prediction_type -- Either "epsilon" (predict noise) or "v_prediction" (predict velocity).
  • max_grad_norm -- Maximum gradient norm for clipping. Default 1.0.
  • gradient_accumulation_steps -- Number of steps to accumulate gradients before an optimizer step.

Theoretical Basis

The DreamBooth training objective is a dual-objective denoising score matching loss:

TRAINING STEP:
    1. ENCODE: z = VAE.encode(x).latent_dist.sample() * scaling_factor
    2. NOISE:  eps ~ N(0, I),  t ~ Uniform(0, T)
    3. NOISY:  z_t = scheduler.add_noise(z, eps, t)
    4. PREDICT: eps_hat = UNet(z_t, t, text_embed(prompt))

    5. TARGET SELECTION:
       If prediction_type == "epsilon":  target = eps
       If prediction_type == "v_prediction": target = scheduler.get_velocity(z, eps, t)

    6. LOSS DECOMPOSITION (with prior preservation):
       eps_hat_instance, eps_hat_prior = chunk(eps_hat, 2, dim=0)
       target_instance, target_prior   = chunk(target, 2, dim=0)

       L_instance = MSE(eps_hat_instance, target_instance)
       L_prior    = MSE(eps_hat_prior, target_prior)
       L_total    = L_instance + lambda * L_prior

    7. GRADIENT UPDATE:
       backward(L_total)
       clip_grad_norm_(lora_params, max_grad_norm)
       optimizer.step()
       lr_scheduler.step()
       optimizer.zero_grad()

Key theoretical properties:

  • Batch-level decomposition -- Instance and class images are concatenated in a single batch and processed in one forward pass. The torch.chunk() operation splits the output along the batch dimension, enabling separate loss computation without redundant forward passes.
  • Lambda weighting -- The prior loss weight lambda (default 1.0) balances personalization fidelity against class diversity. Higher values preserve more class diversity but may reduce subject fidelity; lower values risk overfitting.
  • Gradient clipping -- The max_grad_norm parameter prevents gradient explosions that can occur when the small LoRA adapter receives large gradients from the denoising loss, especially early in training.
  • Prediction type flexibility -- The training target adapts to the scheduler's prediction type: "epsilon" for noise prediction or "v_prediction" for velocity prediction, ensuring compatibility with different pretrained model configurations.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment