Principle:Huggingface Alignment handbook LoRA Adapter Configuration
| Knowledge Sources | |
|---|---|
| Domains | NLP, Deep_Learning, Optimization |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A parameter-efficient fine-tuning technique that injects trainable low-rank decomposition matrices into transformer layers, enabling adaptation with minimal additional parameters.
Description
Low-Rank Adaptation (LoRA) freezes the pretrained model weights and injects pairs of trainable rank decomposition matrices into each targeted transformer layer. Instead of updating the full weight matrix , LoRA trains two smaller matrices and where r is much smaller than both d and k.
This reduces the number of trainable parameters from millions to thousands while achieving comparable performance to full fine-tuning. The LoRA adapter weights are saved separately from the base model, enabling efficient storage and switching between multiple fine-tuned versions.
In the alignment-handbook, LoRA configuration is specified in YAML recipe configs and the adapters are injected automatically by the TRL trainers when a PEFT config is provided.
Usage
Use LoRA adapter configuration when:
- Parameter-efficient fine-tuning is needed (QLoRA workflow)
- Multiple fine-tuned model variants need to share the same base model
- GPU memory is limited and full fine-tuning is not feasible
- Quick experimentation with different LoRA hyperparameters (rank, target modules, alpha) is desired
Theoretical Basis
LoRA decomposes weight updates into low-rank matrices:
Where:
- is the frozen pretrained weight matrix
- and are trainable
- is the rank (e.g., 16, 32, 64, 128)
- is the scaling factor (typically set equal to r)
# Abstract LoRA forward pass (NOT real implementation)
# During training, for each targeted linear layer:
output = W @ x + (alpha / r) * (B @ (A @ x))
# Only B and A receive gradients; W is frozen
# Target modules in alignment-handbook (all linear projections):
target_modules = [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]
Key hyperparameter choices in alignment-handbook:
- SFT LoRA rank: 16 (sufficient for instruction following)
- DPO LoRA rank: 128 (preference optimization needs more capacity)
- Target modules: All attention projections + MLP projections for maximum expressiveness
- Dropout: 0.05 for regularization