Heuristic:Huggingface Peft RSLoRA Scaling

Knowledge Sources	HuggingFace PEFT Rank-Stabilized LoRA
Domains	LLMs, Optimization, Fine_Tuning
Last Updated	2026-02-07 06:44 GMT

Overview

Use `use_rslora=True` in LoraConfig for mathematically proven better scaling of adapter weights via `lora_alpha/sqrt(r)` instead of `lora_alpha/r`.

Description

Rank-Stabilized LoRA (RSLoRA) modifies the adapter scaling factor from the original `lora_alpha / r` to `lora_alpha / math.sqrt(r)`. This change was proven in the paper "A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA" to provide better training dynamics, especially at higher ranks. The standard scaling `lora_alpha/r` causes the effective learning rate to decrease as rank increases, while `lora_alpha/sqrt(r)` maintains a more stable effective learning rate across different rank values.

Usage

Use this heuristic whenever configuring LoRA adapters. It is a no-cost improvement that can be enabled via a single boolean flag. It is especially beneficial when:

Experimenting with different ranks (ensures consistent scaling behavior)
Using higher ranks (r=32, 64, 128) where the standard scaling attenuates updates too aggressively
Looking for incremental performance improvements at no additional compute cost

The Insight (Rule of Thumb)

Action: Set `use_rslora=True` in `LoraConfig`.
Value: Changes scaling from `lora_alpha/r` to `lora_alpha/math.sqrt(r)`.
Trade-off: None. Same memory, same compute, same training time. Only changes the scaling factor.
Compatibility: Works with all LoRA variants (DoRA, PiSSA, EVA, CorDA, etc.) and all quantization backends.

Reasoning

The original LoRA paper uses `lora_alpha/r` as the scaling factor. However, as rank `r` increases, this scaling shrinks rapidly. At r=64, the effective scale is 64x smaller than at r=1. RSLoRA uses `lora_alpha/sqrt(r)` which provides more stable gradients across ranks:

Rank (r)	Standard (alpha/r)	RSLoRA (alpha/sqrt(r))
4	0.250	0.500
8	0.125	0.354
16	0.063	0.250
64	0.016	0.125
128	0.008	0.088

(Assuming lora_alpha=1 for illustration)

The RSLoRA scaling ensures that increasing rank does not disproportionately suppress the adapter's contribution.

Code Evidence

RSLoRA configuration from `src/peft/tuners/lora/config.py:489-498`:

use_rslora: bool = field(
    default=False,
    metadata={
        "help": (
            "When set to True, uses [Rank-Stabilized LoRA]"
            "(https://huggingface.co/papers/2312.03732)"
            " which sets the adapter scaling factor to "
            "`lora_alpha/math.sqrt(r)`, since it"
            " was proven to work better. Otherwise, it will "
            "use the original default"
            " value of `lora_alpha/r`."
        )
    },
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment