Heuristic:Huggingface Peft LoRA Default Configuration

Knowledge Sources	HuggingFace PEFT LoRA Config Defaults SFT Example
Domains	LLMs, Fine_Tuning, Configuration
Last Updated	2026-02-07 06:44 GMT

Overview

Practical guidelines for choosing LoRA hyperparameters (rank, alpha, dropout, target modules) based on library defaults and example configurations.

Description

The PEFT library defines conservative defaults for LoRA: `r=8`, `lora_alpha=8`, `lora_dropout=0.0`, `bias="none"`. However, the official SFT training example uses more aggressive settings: `r=64`, `lora_alpha=16`, `lora_dropout=0.1`, targeting all attention and MLP projections. Understanding when to use defaults vs. example-recommended values is important for achieving good fine-tuning results.

Usage

Use this heuristic when setting up any LoRA fine-tuning run. The choice of parameters depends on:

Task complexity: Classification tasks may need lower rank; language generation benefits from higher rank
Model size: Larger models can benefit from higher ranks
Available VRAM: Higher rank = more trainable parameters = more memory
Regularization needs: Dropout helps prevent overfitting on small datasets

The Insight (Rule of Thumb)

Conservative Start (Library Defaults)

r=8: Minimal rank, very low parameter overhead. Good for testing and simple tasks.
lora_alpha=8: Equal to r, giving a scaling factor of 1.0 (or ~2.83 with RSLoRA).
lora_dropout=0.0: No regularization.
bias="none": Do not train biases.
target_modules=None: Auto-detected per model architecture.

Recommended for LLM Fine-Tuning (From Examples)

r=64: Higher rank for richer adaptation capacity.
lora_alpha=16: alpha/r = 0.25 (moderate scaling).
lora_dropout=0.1: Mild regularization.
target_modules=`"q_proj,k_proj,v_proj,o_proj,down_proj,up_proj,gate_proj"`: All attention and MLP projections for comprehensive adaptation.
use_rslora=True: Better rank-scaling behavior.

Key Guidelines

target_modules="all-linear": Shorthand to target all linear layers (excluding output head in PreTrainedModel). Use when you want maximum adapter coverage.
modules_to_save: Set this for classifier/score heads in classification tasks, as they are randomly initialized and need training.
bias="lora_only": Only use when LoRA weights were extracted from fully fine-tuned parameters.
use_dora=True: Improves performance especially at low ranks (r=4-8), but adds overhead. Recommend merging weights for inference.

Reasoning

The library defaults (`r=8`) are intentionally conservative to work across all use cases with minimal memory overhead. The SFT training example represents a battle-tested production configuration for LLM fine-tuning.

Key parameter interactions:

Parameter	Default	SFT Example	When to Change
r	8	64	Increase for complex tasks; decrease if VRAM-limited
lora_alpha	8	16	Usually set to r (default scaling) or r/4 with RSLoRA
lora_dropout	0.0	0.1	Add dropout when dataset is small to prevent overfitting
target_modules	auto	all 7 projections	Target more modules for richer adaptation
bias	"none"	"none"	Only change to "all" or "lora_only" for specific tasks

Code Evidence

Library defaults from `src/peft/tuners/lora/config.py:460-481`:

r: int = field(default=8, metadata={"help": "Lora attention dimension"})
target_modules: Optional[Union[list[str], str]] = field(default=None, ...)
lora_alpha: int = field(default=8, metadata={"help": "Lora alpha"})
lora_dropout: float = field(default=0.0, metadata={"help": "Lora dropout"})
bias: Literal["none", "all", "lora_only"] = field(default="none", ...)

SFT example configuration from `examples/sft/train.py:27-73`:

lora_alpha: Optional[int] = field(default=16)
lora_dropout: Optional[float] = field(default=0.1)
lora_r: Optional[int] = field(default=64)
lora_target_modules: Optional[str] = field(
    default="q_proj,k_proj,v_proj,o_proj,down_proj,up_proj,gate_proj",
)
use_flash_attn: Optional[bool] = field(default=False)
use_4bit_quantization: Optional[bool] = field(default=False)
use_reentrant: Optional[bool] = field(default=False)

Auto-detection of target modules from `src/peft/tuners/lora/config.py:330-338`:

# If this is specified as 'all-linear', then all linear/Conv1D
# modules are chosen (if the model is a PreTrainedModel, the
# output layer excluded). If this is not specified, modules will
# be chosen according to the model architecture.

BD-LoRA serving guidelines from `src/peft/tuners/lora/config.py:130-139`:

# For attention, set:
#   Q,K,V projections to be LoRA-B block-diagonal
#   Out projection to be LoRA-A block-diagonal
# For MLPs, set:
#   Up, Gate projection to be LoRA-B block-diagonal
#   Down projection to be LoRA-A block-diagonal
# Modules that are row-sharded should have LoRA-A block-diagonal,
# modules that are column-sharded should have LoRA-B block-diagonal.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment