Principle:Alibaba ROLL LoRA Parameter Optimization
| Knowledge Sources | |
|---|---|
| Domains | Diffusion_Models, Optimization |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
A parameter-efficient optimization principle for updating LoRA adapters on diffusion models using reward flow gradients.
Description
LoRA Parameter Optimization updates only the low-rank adapter parameters on the DiT model, keeping all other components frozen. The loss combines normalized face identity reward with KL regularization:
loss = -(face_score - 0.54) / 0.16 * 0.1 + kl_loss
The reward normalization (subtracting 0.54 baseline, dividing by 0.16 scale) ensures stable gradient magnitudes.
Usage
Use as the training objective for reward flow diffusion model fine-tuning.
Theoretical Basis
The normalized reward guides LoRA updates:
Where is the baseline, is the scale, and is the reward weight.
Related Pages
Implemented By
Related Heuristics
The following heuristics inform this principle: