Principle:Huggingface Alignment handbook LoRA Adapter Configuration

Knowledge Sources	Alignment Handbook LoRA: Low-Rank Adaptation of Large Language Models PEFT Documentation
Domains	NLP, Deep_Learning, Optimization
Last Updated	2026-02-07 00:00 GMT

Overview

A parameter-efficient fine-tuning technique that injects trainable low-rank decomposition matrices into transformer layers, enabling adaptation with minimal additional parameters.

Description

Low-Rank Adaptation (LoRA) freezes the pretrained model weights and injects pairs of trainable rank decomposition matrices into each targeted transformer layer. Instead of updating the full weight matrix $W \in ℝ^{d \times k}$ , LoRA trains two smaller matrices $B \in ℝ^{d \times r}$ and $A \in ℝ^{r \times k}$ where r is much smaller than both d and k.

This reduces the number of trainable parameters from millions to thousands while achieving comparable performance to full fine-tuning. The LoRA adapter weights are saved separately from the base model, enabling efficient storage and switching between multiple fine-tuned versions.

In the alignment-handbook, LoRA configuration is specified in YAML recipe configs and the adapters are injected automatically by the TRL trainers when a PEFT config is provided.

Usage

Use LoRA adapter configuration when:

Parameter-efficient fine-tuning is needed (QLoRA workflow)
Multiple fine-tuned model variants need to share the same base model
GPU memory is limited and full fine-tuning is not feasible
Quick experimentation with different LoRA hyperparameters (rank, target modules, alpha) is desired

Theoretical Basis

LoRA decomposes weight updates into low-rank matrices:

$W^{'} = W + \frac{α}{r} \cdot B A$

Where:

$W$ is the frozen pretrained weight matrix
$B \in ℝ^{d \times r}$ and $A \in ℝ^{r \times k}$ are trainable
$r$ is the rank (e.g., 16, 32, 64, 128)
$α$ is the scaling factor (typically set equal to r)

# Abstract LoRA forward pass (NOT real implementation)
# During training, for each targeted linear layer:
output = W @ x + (alpha / r) * (B @ (A @ x))
# Only B and A receive gradients; W is frozen

# Target modules in alignment-handbook (all linear projections):
target_modules = [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]

Key hyperparameter choices in alignment-handbook:

SFT LoRA rank: 16 (sufficient for instruction following)
DPO LoRA rank: 128 (preference optimization needs more capacity)
Target modules: All attention projections + MLP projections for maximum expressiveness
Dropout: 0.05 for regularization

Related Pages

Implemented By

Implementation:Huggingface_Alignment_handbook_Get_Peft_Config

Uses Heuristic

Heuristic:Huggingface_Alignment_handbook_LoRA_Rank_Selection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment