Heuristic:Unslothai Unsloth LoRA Rank Selection

Knowledge Sources	Unsloth Unsloth defaults
Domains	Optimization, LLMs, Fine_Tuning
Last Updated	2026-02-07 09:00 GMT

Overview

Unsloth defaults LoRA rank `r=16` with `lora_alpha=16` for SFT, while RL/vLLM inference uses `max_lora_rank=64` to accommodate the higher capacity needed for reinforcement learning adapters.

Description

The LoRA rank (`r`) controls the dimensionality of the low-rank adapter matrices. Higher rank means more trainable parameters and greater capacity to learn new behaviors, but also more VRAM usage and slower training. Unsloth's defaults reflect empirical findings: `r=16` is sufficient for most supervised fine-tuning tasks where the model is learning to follow a specific format or domain. RL workflows (GRPO) benefit from higher rank (`r=64`) because the reward-driven optimization explores a wider parameter space. The `max_lora_rank` parameter in vLLM inference must match or exceed the training rank.

Usage

Use `r=16` for standard SFT tasks (instruction tuning, chat fine-tuning, domain adaptation). Consider `r=32` or `r=64` for RL training (GRPO, DPO) where the model needs more capacity to learn complex reward-driven behaviors. Set `max_lora_rank` to match your training rank when using `fast_inference=True`.

The Insight (Rule of Thumb)

Action: Set `r=16, lora_alpha=16` for SFT; increase to `r=64` for RL training.
Value: SFT default: `r=16`; RL/inference default: `max_lora_rank=64`.
Trade-off: Doubling rank roughly doubles LoRA VRAM usage and training time. Rank 16 is the sweet spot for SFT where additional capacity yields diminishing returns.
Compatibility: Target modules default to `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]` (all attention + MLP projections).

Reasoning

The default `r=16` with `lora_alpha=16` (effective scaling = alpha/r = 1.0) has been empirically validated across many Unsloth users. For SFT, the model is typically learning formatting patterns and domain knowledge that don't require high-rank adaptations. RL tasks like GRPO involve more complex optimization landscapes where the reward function drives exploration across a broader parameter space, benefiting from the additional capacity of `r=64`.

SFT defaults from `models/llama.py:2635-2648`:

@staticmethod
def get_peft_model(
    model,
    r = 16,
    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_alpha = 16,
    lora_dropout = 0.0,
    bias = "none",
    ...
)

RL inference rank from `models/loader.py:146`:

max_lora_rank = 64,  # Default for vLLM-based RL inference

SFT inference rank from `models/llama.py:2150`:

max_lora_rank = 16,  # Default for standard inference

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment