Heuristic:Unslothai Unsloth LoRA Rank Selection
| Knowledge Sources | |
|---|---|
| Domains | Optimization, LLMs, Fine_Tuning |
| Last Updated | 2026-02-07 09:00 GMT |
Overview
Unsloth defaults LoRA rank `r=16` with `lora_alpha=16` for SFT, while RL/vLLM inference uses `max_lora_rank=64` to accommodate the higher capacity needed for reinforcement learning adapters.
Description
The LoRA rank (`r`) controls the dimensionality of the low-rank adapter matrices. Higher rank means more trainable parameters and greater capacity to learn new behaviors, but also more VRAM usage and slower training. Unsloth's defaults reflect empirical findings: `r=16` is sufficient for most supervised fine-tuning tasks where the model is learning to follow a specific format or domain. RL workflows (GRPO) benefit from higher rank (`r=64`) because the reward-driven optimization explores a wider parameter space. The `max_lora_rank` parameter in vLLM inference must match or exceed the training rank.
Usage
Use `r=16` for standard SFT tasks (instruction tuning, chat fine-tuning, domain adaptation). Consider `r=32` or `r=64` for RL training (GRPO, DPO) where the model needs more capacity to learn complex reward-driven behaviors. Set `max_lora_rank` to match your training rank when using `fast_inference=True`.
The Insight (Rule of Thumb)
- Action: Set `r=16, lora_alpha=16` for SFT; increase to `r=64` for RL training.
- Value: SFT default: `r=16`; RL/inference default: `max_lora_rank=64`.
- Trade-off: Doubling rank roughly doubles LoRA VRAM usage and training time. Rank 16 is the sweet spot for SFT where additional capacity yields diminishing returns.
- Compatibility: Target modules default to `["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]` (all attention + MLP projections).
Reasoning
The default `r=16` with `lora_alpha=16` (effective scaling = alpha/r = 1.0) has been empirically validated across many Unsloth users. For SFT, the model is typically learning formatting patterns and domain knowledge that don't require high-rank adaptations. RL tasks like GRPO involve more complex optimization landscapes where the reward function drives exploration across a broader parameter space, benefiting from the additional capacity of `r=64`.
SFT defaults from `models/llama.py:2635-2648`:
@staticmethod
def get_peft_model(
model,
r = 16,
target_modules = [
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
],
lora_alpha = 16,
lora_dropout = 0.0,
bias = "none",
...
)
RL inference rank from `models/loader.py:146`:
max_lora_rank = 64, # Default for vLLM-based RL inference
SFT inference rank from `models/llama.py:2150`:
max_lora_rank = 16, # Default for standard inference