Principle:Huggingface Diffusers Prior Preservation Training
| Knowledge Sources | |
|---|---|
| Domains | |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
A training principle that combines instance-specific denoising loss with prior preservation regularization loss in a single training loop. This dual-objective formulation is the core of DreamBooth training, enabling subject personalization while preventing catastrophic forgetting of the model's general capabilities.
Description
The DreamBooth training loop extends the standard diffusion model training objective with a prior preservation term. In each training step, the loop processes a batch that contains both instance (subject) images and class-prior images concatenated together. The key steps are:
- Latent encoding -- Pixel values are encoded to the latent space via the frozen VAE encoder.
- Noise sampling -- Random Gaussian noise is sampled and added to the latents at a random timestep according to the noise schedule.
- Noise prediction -- The UNet predicts the noise (or velocity, depending on
prediction_type) from the noisy latents and text conditioning. - Loss decomposition -- When prior preservation is enabled, the model prediction and target are split into instance and class halves using
torch.chunk(). - Dual loss computation -- Instance loss and prior loss are computed separately as MSE losses, then combined:
loss = loss_instance + lambda * loss_prior. - Gradient update -- Gradients are computed, clipped, and applied only to the trainable LoRA parameters.
Without prior preservation, the loop simplifies to the standard denoising objective computed only on instance images.
Usage
The training loop is configured by several key parameters:
with_prior_preservation-- Enables the dual-objective formulation.prior_loss_weight(lambda) -- Controls the relative importance of the prior loss. Default1.0.prediction_type-- Either"epsilon"(predict noise) or"v_prediction"(predict velocity).max_grad_norm-- Maximum gradient norm for clipping. Default1.0.gradient_accumulation_steps-- Number of steps to accumulate gradients before an optimizer step.
Theoretical Basis
The DreamBooth training objective is a dual-objective denoising score matching loss:
TRAINING STEP:
1. ENCODE: z = VAE.encode(x).latent_dist.sample() * scaling_factor
2. NOISE: eps ~ N(0, I), t ~ Uniform(0, T)
3. NOISY: z_t = scheduler.add_noise(z, eps, t)
4. PREDICT: eps_hat = UNet(z_t, t, text_embed(prompt))
5. TARGET SELECTION:
If prediction_type == "epsilon": target = eps
If prediction_type == "v_prediction": target = scheduler.get_velocity(z, eps, t)
6. LOSS DECOMPOSITION (with prior preservation):
eps_hat_instance, eps_hat_prior = chunk(eps_hat, 2, dim=0)
target_instance, target_prior = chunk(target, 2, dim=0)
L_instance = MSE(eps_hat_instance, target_instance)
L_prior = MSE(eps_hat_prior, target_prior)
L_total = L_instance + lambda * L_prior
7. GRADIENT UPDATE:
backward(L_total)
clip_grad_norm_(lora_params, max_grad_norm)
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
Key theoretical properties:
- Batch-level decomposition -- Instance and class images are concatenated in a single batch and processed in one forward pass. The
torch.chunk()operation splits the output along the batch dimension, enabling separate loss computation without redundant forward passes. - Lambda weighting -- The prior loss weight
lambda(default 1.0) balances personalization fidelity against class diversity. Higher values preserve more class diversity but may reduce subject fidelity; lower values risk overfitting. - Gradient clipping -- The
max_grad_normparameter prevents gradient explosions that can occur when the small LoRA adapter receives large gradients from the denoising loss, especially early in training. - Prediction type flexibility -- The training target adapts to the scheduler's prediction type:
"epsilon"for noise prediction or"v_prediction"for velocity prediction, ensuring compatibility with different pretrained model configurations.