Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Huggingface Peft RSLoRA Scaling

From Leeroopedia




Knowledge Sources
Domains LLMs, Optimization, Fine_Tuning
Last Updated 2026-02-07 06:44 GMT

Overview

Use `use_rslora=True` in LoraConfig for mathematically proven better scaling of adapter weights via `lora_alpha/sqrt(r)` instead of `lora_alpha/r`.

Description

Rank-Stabilized LoRA (RSLoRA) modifies the adapter scaling factor from the original `lora_alpha / r` to `lora_alpha / math.sqrt(r)`. This change was proven in the paper "A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA" to provide better training dynamics, especially at higher ranks. The standard scaling `lora_alpha/r` causes the effective learning rate to decrease as rank increases, while `lora_alpha/sqrt(r)` maintains a more stable effective learning rate across different rank values.

Usage

Use this heuristic whenever configuring LoRA adapters. It is a no-cost improvement that can be enabled via a single boolean flag. It is especially beneficial when:

  • Experimenting with different ranks (ensures consistent scaling behavior)
  • Using higher ranks (r=32, 64, 128) where the standard scaling attenuates updates too aggressively
  • Looking for incremental performance improvements at no additional compute cost

The Insight (Rule of Thumb)

  • Action: Set `use_rslora=True` in `LoraConfig`.
  • Value: Changes scaling from `lora_alpha/r` to `lora_alpha/math.sqrt(r)`.
  • Trade-off: None. Same memory, same compute, same training time. Only changes the scaling factor.
  • Compatibility: Works with all LoRA variants (DoRA, PiSSA, EVA, CorDA, etc.) and all quantization backends.

Reasoning

The original LoRA paper uses `lora_alpha/r` as the scaling factor. However, as rank `r` increases, this scaling shrinks rapidly. At r=64, the effective scale is 64x smaller than at r=1. RSLoRA uses `lora_alpha/sqrt(r)` which provides more stable gradients across ranks:

Rank (r) Standard (alpha/r) RSLoRA (alpha/sqrt(r))
4 0.250 0.500
8 0.125 0.354
16 0.063 0.250
64 0.016 0.125
128 0.008 0.088

(Assuming lora_alpha=1 for illustration)

The RSLoRA scaling ensures that increasing rank does not disproportionately suppress the adapter's contribution.

Code Evidence

RSLoRA configuration from `src/peft/tuners/lora/config.py:489-498`:

use_rslora: bool = field(
    default=False,
    metadata={
        "help": (
            "When set to True, uses [Rank-Stabilized LoRA]"
            "(https://huggingface.co/papers/2312.03732)"
            " which sets the adapter scaling factor to "
            "`lora_alpha/math.sqrt(r)`, since it"
            " was proven to work better. Otherwise, it will "
            "use the original default"
            " value of `lora_alpha/r`."
        )
    },
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment