Heuristic:Huggingface Alignment handbook LoRA Rank Selection

Knowledge Sources	Alignment Handbook Internal
Domains	Optimization, LLMs
Last Updated	2026-02-07 00:00 GMT

Overview

LoRA rank should be increased from 16 (SFT) to 128 (DPO) when training preference-aligned models to capture more nuanced preference distinctions.

Description

The alignment-handbook uses different LoRA ranks for different training stages. SFT uses rank 16 (sufficient for learning instruction-following patterns), while DPO uses rank 128 (needed to capture more subtle preference distinctions between chosen and rejected responses). Both stages use alpha equal to the rank (alpha=r), which normalizes the LoRA scaling factor to 1.

Usage

Apply this when configuring LoRA adapters for different training stages. Use lower rank (16) for SFT and higher rank (128) for DPO/preference optimization.

The Insight (Rule of Thumb)

Action: Set `lora_r` to 16 for SFT training and 128 for DPO training. Set `lora_alpha` equal to `lora_r`.
Value:
- SFT: `lora_r: 16`, `lora_alpha: 16`
- DPO: `lora_r: 128`, `lora_alpha: 128`
Trade-off: Higher rank means more trainable parameters and higher memory usage, but better capacity to learn preference-relevant features.

Reasoning

SFT teaches general instruction-following behavior, which can be captured with a low-rank perturbation. DPO requires learning fine-grained distinctions between preferred and dispreferred responses, which requires a higher-rank representation. Setting alpha equal to rank (alpha=r) means the LoRA scaling factor is 1, simplifying hyperparameter tuning.

SFT QLoRA config from `recipes/zephyr-7b-beta/sft/config_qlora.yaml:10-11`:

lora_r: 16
lora_alpha: 16

DPO QLoRA config from `recipes/zephyr-7b-beta/dpo/config_qlora.yaml:9-10`:

lora_r: 128
lora_alpha: 128

Both configs target all linear layers:

lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment