Heuristic:Huggingface Peft RSLoRA Scaling
| Knowledge Sources | |
|---|---|
| Domains | LLMs, Optimization, Fine_Tuning |
| Last Updated | 2026-02-07 06:44 GMT |
Overview
Use `use_rslora=True` in LoraConfig for mathematically proven better scaling of adapter weights via `lora_alpha/sqrt(r)` instead of `lora_alpha/r`.
Description
Rank-Stabilized LoRA (RSLoRA) modifies the adapter scaling factor from the original `lora_alpha / r` to `lora_alpha / math.sqrt(r)`. This change was proven in the paper "A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA" to provide better training dynamics, especially at higher ranks. The standard scaling `lora_alpha/r` causes the effective learning rate to decrease as rank increases, while `lora_alpha/sqrt(r)` maintains a more stable effective learning rate across different rank values.
Usage
Use this heuristic whenever configuring LoRA adapters. It is a no-cost improvement that can be enabled via a single boolean flag. It is especially beneficial when:
- Experimenting with different ranks (ensures consistent scaling behavior)
- Using higher ranks (r=32, 64, 128) where the standard scaling attenuates updates too aggressively
- Looking for incremental performance improvements at no additional compute cost
The Insight (Rule of Thumb)
- Action: Set `use_rslora=True` in `LoraConfig`.
- Value: Changes scaling from `lora_alpha/r` to `lora_alpha/math.sqrt(r)`.
- Trade-off: None. Same memory, same compute, same training time. Only changes the scaling factor.
- Compatibility: Works with all LoRA variants (DoRA, PiSSA, EVA, CorDA, etc.) and all quantization backends.
Reasoning
The original LoRA paper uses `lora_alpha/r` as the scaling factor. However, as rank `r` increases, this scaling shrinks rapidly. At r=64, the effective scale is 64x smaller than at r=1. RSLoRA uses `lora_alpha/sqrt(r)` which provides more stable gradients across ranks:
| Rank (r) | Standard (alpha/r) | RSLoRA (alpha/sqrt(r)) |
|---|---|---|
| 4 | 0.250 | 0.500 |
| 8 | 0.125 | 0.354 |
| 16 | 0.063 | 0.250 |
| 64 | 0.016 | 0.125 |
| 128 | 0.008 | 0.088 |
(Assuming lora_alpha=1 for illustration)
The RSLoRA scaling ensures that increasing rank does not disproportionately suppress the adapter's contribution.
Code Evidence
RSLoRA configuration from `src/peft/tuners/lora/config.py:489-498`:
use_rslora: bool = field(
default=False,
metadata={
"help": (
"When set to True, uses [Rank-Stabilized LoRA]"
"(https://huggingface.co/papers/2312.03732)"
" which sets the adapter scaling factor to "
"`lora_alpha/math.sqrt(r)`, since it"
" was proven to work better. Otherwise, it will "
"use the original default"
" value of `lora_alpha/r`."
)
},
)