Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Alibaba ROLL LoRA Parameter Optimization

From Leeroopedia


Knowledge Sources
Domains Diffusion_Models, Optimization
Last Updated 2026-02-07 20:00 GMT

Overview

A parameter-efficient optimization principle for updating LoRA adapters on diffusion models using reward flow gradients.

Description

LoRA Parameter Optimization updates only the low-rank adapter parameters on the DiT model, keeping all other components frozen. The loss combines normalized face identity reward with KL regularization:

loss = -(face_score - 0.54) / 0.16 * 0.1 + kl_loss

The reward normalization (subtracting 0.54 baseline, dividing by 0.16 scale) ensures stable gradient magnitudes.

Usage

Use as the training objective for reward flow diffusion model fine-tuning.

Theoretical Basis

The normalized reward guides LoRA updates: θLoRAθLoRAηθLoRA[rbsw+DKL]

Where b=0.54 is the baseline, s=0.16 is the scale, and w=0.1 is the reward weight.

Related Pages

Implemented By

Related Heuristics

The following heuristics inform this principle:

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment