Principle:NVIDIA NeMo Aligner DRaFT Plus Image Alignment

Knowledge Sources	NVIDIA_NeMo_Aligner
Domains	Multimodal, Image Generation, Diffusion Models, Alignment
Last Updated	2026-02-08 00:00 GMT

Overview

DRaFT+ (Differentiable Reward Fine-Tuning Plus) aligns diffusion models with human preferences by backpropagating through a reward model during the denoising process, using truncated backpropagation through the final diffusion steps and KL divergence to prevent mode collapse.

Description

DRaFT+ is a method for fine-tuning text-to-image diffusion models (such as Stable Diffusion and SDXL) to better align with human aesthetic preferences. It works by:

Differentiable reward optimization -- During training, the model generates images through its standard denoising process. The generated images are passed through a reward model (e.g., a CLIP-based model trained on human preference data), and the reward signal is backpropagated through the diffusion process to update the model parameters.

Truncated backpropagation -- Computing gradients through the entire denoising chain (e.g., 50 DDIM steps) would be prohibitively expensive. DRaFT+ uses truncated backpropagation through only the final truncation_steps of the denoising process. Earlier steps are run with torch.no_grad(), dramatically reducing memory and compute requirements while still providing meaningful gradient signal.

KL divergence regularization -- To prevent the fine-tuned model from collapsing to a narrow mode of high-reward images, DRaFT+ computes a Gaussian KL penalty between the noise predictions (epsilon) of the fine-tuned model and those of the original (frozen) base model. This penalty, weighted by kl_coeff, ensures the fine-tuned model stays close to the original distribution.

Annealed sampling -- At inference time, DRaFT+ supports annealed guidance, which interpolates between the base model's score function and the fine-tuned model's score function at each denoising step using a configurable weighing function (linear, power, step, etc.). This enables smooth control over the alignment-diversity tradeoff.

The framework supports both Stable Diffusion (SD) via MegatronSDDRaFTPModel and Stable Diffusion XL (SDXL) via MegatronSDXLDRaFTPModel, as well as full fine-tuning and PEFT (LoRA) approaches.

Usage

DRaFT+ is used when:

You want to improve the aesthetic quality of a text-to-image diffusion model according to human preferences.
You have a trained reward model (e.g., PickScore, HPSv2) that scores image-text alignment quality.
You want to fine-tune while preventing mode collapse via KL regularization.
You want to control the alignment-diversity tradeoff at inference time via annealed sampling.

Theoretical Basis

Training Objective: $ℒ = - 𝔼_{x_{T} \sim 𝒩 (0, I), c \sim D} [R (G_{θ} (x_{T}, c), c)] + λ \cdot D_{KL}$

where $R$ is the reward model, $G_{θ}$ is the diffusion generator, $c$ is the text conditioning, and $λ$ is the KL coefficient.

Gaussian KL Penalty (shared variance): $D_{KL} = \frac{1}{d} \sum_{t \in trunc} ‖ ϵ_{θ} (x_{t}, t, c) - ϵ_{θ_{0}} (x_{t}, t, c) ‖^{2}$

where $d$ is the dimensionality of the noise prediction, $ϵ_{θ}$ is the fine-tuned model's noise prediction, and $ϵ_{θ_{0}}$ is the frozen base model's noise prediction. The sum is over the truncated denoising steps.

Annealed Sampling:

At inference step $i$ out of $T$ total steps, the combined noise prediction is: $ϵ_{combined} = (1 - w (i, T)) \cdot ϵ_{θ_{0}} + w (i, T) \cdot ϵ_{θ}$

where $w (i, T)$ is a weighing function such as:

Linear: $w (i, T) = i / T$
Power: $w (i, T) = (i / T)^{p}$
Step: $w (i, T) = 𝟙 [i / T \geq τ]$

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment