Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Huggingface Alignment handbook LoRA Adapter Configuration

From Leeroopedia


Knowledge Sources
Domains NLP, Deep_Learning, Optimization
Last Updated 2026-02-07 00:00 GMT

Overview

A parameter-efficient fine-tuning technique that injects trainable low-rank decomposition matrices into transformer layers, enabling adaptation with minimal additional parameters.

Description

Low-Rank Adaptation (LoRA) freezes the pretrained model weights and injects pairs of trainable rank decomposition matrices into each targeted transformer layer. Instead of updating the full weight matrix Wd×k, LoRA trains two smaller matrices Bd×r and Ar×k where r is much smaller than both d and k.

This reduces the number of trainable parameters from millions to thousands while achieving comparable performance to full fine-tuning. The LoRA adapter weights are saved separately from the base model, enabling efficient storage and switching between multiple fine-tuned versions.

In the alignment-handbook, LoRA configuration is specified in YAML recipe configs and the adapters are injected automatically by the TRL trainers when a PEFT config is provided.

Usage

Use LoRA adapter configuration when:

  • Parameter-efficient fine-tuning is needed (QLoRA workflow)
  • Multiple fine-tuned model variants need to share the same base model
  • GPU memory is limited and full fine-tuning is not feasible
  • Quick experimentation with different LoRA hyperparameters (rank, target modules, alpha) is desired

Theoretical Basis

LoRA decomposes weight updates into low-rank matrices:

W=W+αrBA

Where:

  • W is the frozen pretrained weight matrix
  • Bd×r and Ar×k are trainable
  • r is the rank (e.g., 16, 32, 64, 128)
  • α is the scaling factor (typically set equal to r)
# Abstract LoRA forward pass (NOT real implementation)
# During training, for each targeted linear layer:
output = W @ x + (alpha / r) * (B @ (A @ x))
# Only B and A receive gradients; W is frozen

# Target modules in alignment-handbook (all linear projections):
target_modules = [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]

Key hyperparameter choices in alignment-handbook:

  • SFT LoRA rank: 16 (sufficient for instruction following)
  • DPO LoRA rank: 128 (preference optimization needs more capacity)
  • Target modules: All attention projections + MLP projections for maximum expressiveness
  • Dropout: 0.05 for regularization

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment