Heuristic:Huggingface Peft LoRA Default Configuration
| Knowledge Sources | |
|---|---|
| Domains | LLMs, Fine_Tuning, Configuration |
| Last Updated | 2026-02-07 06:44 GMT |
Overview
Practical guidelines for choosing LoRA hyperparameters (rank, alpha, dropout, target modules) based on library defaults and example configurations.
Description
The PEFT library defines conservative defaults for LoRA: `r=8`, `lora_alpha=8`, `lora_dropout=0.0`, `bias="none"`. However, the official SFT training example uses more aggressive settings: `r=64`, `lora_alpha=16`, `lora_dropout=0.1`, targeting all attention and MLP projections. Understanding when to use defaults vs. example-recommended values is important for achieving good fine-tuning results.
Usage
Use this heuristic when setting up any LoRA fine-tuning run. The choice of parameters depends on:
- Task complexity: Classification tasks may need lower rank; language generation benefits from higher rank
- Model size: Larger models can benefit from higher ranks
- Available VRAM: Higher rank = more trainable parameters = more memory
- Regularization needs: Dropout helps prevent overfitting on small datasets
The Insight (Rule of Thumb)
Conservative Start (Library Defaults)
- r=8: Minimal rank, very low parameter overhead. Good for testing and simple tasks.
- lora_alpha=8: Equal to r, giving a scaling factor of 1.0 (or ~2.83 with RSLoRA).
- lora_dropout=0.0: No regularization.
- bias="none": Do not train biases.
- target_modules=None: Auto-detected per model architecture.
Recommended for LLM Fine-Tuning (From Examples)
- r=64: Higher rank for richer adaptation capacity.
- lora_alpha=16: alpha/r = 0.25 (moderate scaling).
- lora_dropout=0.1: Mild regularization.
- target_modules=`"q_proj,k_proj,v_proj,o_proj,down_proj,up_proj,gate_proj"`: All attention and MLP projections for comprehensive adaptation.
- use_rslora=True: Better rank-scaling behavior.
Key Guidelines
- target_modules="all-linear": Shorthand to target all linear layers (excluding output head in PreTrainedModel). Use when you want maximum adapter coverage.
- modules_to_save: Set this for classifier/score heads in classification tasks, as they are randomly initialized and need training.
- bias="lora_only": Only use when LoRA weights were extracted from fully fine-tuned parameters.
- use_dora=True: Improves performance especially at low ranks (r=4-8), but adds overhead. Recommend merging weights for inference.
Reasoning
The library defaults (`r=8`) are intentionally conservative to work across all use cases with minimal memory overhead. The SFT training example represents a battle-tested production configuration for LLM fine-tuning.
Key parameter interactions:
| Parameter | Default | SFT Example | When to Change |
|---|---|---|---|
| r | 8 | 64 | Increase for complex tasks; decrease if VRAM-limited |
| lora_alpha | 8 | 16 | Usually set to r (default scaling) or r/4 with RSLoRA |
| lora_dropout | 0.0 | 0.1 | Add dropout when dataset is small to prevent overfitting |
| target_modules | auto | all 7 projections | Target more modules for richer adaptation |
| bias | "none" | "none" | Only change to "all" or "lora_only" for specific tasks |
Code Evidence
Library defaults from `src/peft/tuners/lora/config.py:460-481`:
r: int = field(default=8, metadata={"help": "Lora attention dimension"})
target_modules: Optional[Union[list[str], str]] = field(default=None, ...)
lora_alpha: int = field(default=8, metadata={"help": "Lora alpha"})
lora_dropout: float = field(default=0.0, metadata={"help": "Lora dropout"})
bias: Literal["none", "all", "lora_only"] = field(default="none", ...)
SFT example configuration from `examples/sft/train.py:27-73`:
lora_alpha: Optional[int] = field(default=16)
lora_dropout: Optional[float] = field(default=0.1)
lora_r: Optional[int] = field(default=64)
lora_target_modules: Optional[str] = field(
default="q_proj,k_proj,v_proj,o_proj,down_proj,up_proj,gate_proj",
)
use_flash_attn: Optional[bool] = field(default=False)
use_4bit_quantization: Optional[bool] = field(default=False)
use_reentrant: Optional[bool] = field(default=False)
Auto-detection of target modules from `src/peft/tuners/lora/config.py:330-338`:
# If this is specified as 'all-linear', then all linear/Conv1D
# modules are chosen (if the model is a PreTrainedModel, the
# output layer excluded). If this is not specified, modules will
# be chosen according to the model architecture.
BD-LoRA serving guidelines from `src/peft/tuners/lora/config.py:130-139`:
# For attention, set:
# Q,K,V projections to be LoRA-B block-diagonal
# Out projection to be LoRA-A block-diagonal
# For MLPs, set:
# Up, Gate projection to be LoRA-B block-diagonal
# Down projection to be LoRA-A block-diagonal
# Modules that are row-sharded should have LoRA-A block-diagonal,
# modules that are column-sharded should have LoRA-B block-diagonal.