Heuristic:Huggingface Alignment handbook QLoRA Learning Rate Scaling
| Knowledge Sources | |
|---|---|
| Domains | Optimization, LLMs |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
QLoRA training requires approximately 10x higher learning rates than full fine-tuning because only small LoRA adapter weights are being updated.
Description
When switching from full fine-tuning to QLoRA, the learning rate must be increased significantly. This is because LoRA adapters are low-rank decompositions of the weight updates, and only a small fraction of parameters are trainable. The alignment-handbook recipes consistently use 2e-5 for full SFT and 2e-4 for QLoRA SFT (10x higher), and 5e-7 for full DPO and 5e-6 for QLoRA DPO (10x higher).
Usage
Apply this when switching between full and QLoRA fine-tuning recipes. If you are creating custom QLoRA configs, scale the learning rate up by approximately 10x compared to the full fine-tuning equivalent.
The Insight (Rule of Thumb)
- Action: Multiply the learning rate by ~10x when switching from full fine-tuning to QLoRA.
- Value:
- SFT full: `learning_rate: 2.0e-05` -> SFT QLoRA: `learning_rate: 2.0e-04`
- DPO full: `learning_rate: 5.0e-7` -> DPO QLoRA: `learning_rate: 5.0e-6`
- Trade-off: Too low a learning rate with QLoRA leads to underfitting; too high leads to instability.
Reasoning
LoRA adapters add a low-rank perturbation (A * B) to the frozen base weights. Since the adapter matrices are small (rank 16-128) and randomly initialized near zero, they need larger gradient steps to produce meaningful updates. The alignment-handbook recipes encode this knowledge consistently:
SFT learning rates from recipe configs:
# recipes/zephyr-7b-beta/sft/config_full.yaml:37
learning_rate: 2.0e-05
# recipes/zephyr-7b-beta/sft/config_qlora.yaml:52
learning_rate: 2.0e-04
DPO learning rates from recipe configs:
# recipes/zephyr-7b-beta/dpo/config_full.yaml:37
learning_rate: 5.0e-7
# recipes/zephyr-7b-beta/dpo/config_qlora.yaml:44
learning_rate: 5.0e-6
The 10x factor is consistent across both SFT and DPO stages.