Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Huggingface Transformers LoRA Configuration

From Leeroopedia
Knowledge Sources
Domains Parameter_Efficient_Fine_Tuning, NLP, Low_Rank_Adaptation
Last Updated 2026-02-13 00:00 GMT

Overview

LoRA configuration defines the hyperparameters that control how low-rank adaptation matrices are constructed and injected into a pretrained model, determining the trade-off between parameter efficiency and task performance.

Description

Low-Rank Adaptation (LoRA) is the most widely adopted PEFT method. Rather than fine-tuning all parameters of a pretrained model, LoRA freezes the original weights and injects small trainable low-rank decomposition matrices into selected layers. The configuration object specifies exactly how these matrices are constructed.

Key configuration decisions include:

  • Rank (r): The inner dimension of the low-rank matrices. Lower rank means fewer trainable parameters but potentially reduced expressiveness. Typical values range from 4 to 64, with 8 or 16 being common defaults.
  • Alpha (lora_alpha): A scaling factor that controls the magnitude of the adapter's contribution. The effective scaling is lora_alpha / r. Higher alpha amplifies the adapter's effect relative to the base model.
  • Target modules (target_modules): Which layers in the model receive LoRA adapters. Common targets include attention projection layers (q_proj, v_proj, k_proj, o_proj) and MLP layers. Setting this to None triggers auto-detection of linear layers.
  • Dropout (lora_dropout): Regularization applied to the LoRA layers during training to prevent overfitting.
  • Bias (bias): Whether to train bias terms alongside the LoRA matrices. Options are "none", "all", or "lora_only".
  • Task type (task_type): Informs PEFT of the model architecture type (e.g., CAUSAL_LM) for correct output layer handling.
  • Initialization (init_lora_weights): Controls how LoRA matrices are initialized. The default Kaiming initialization ensures that the adapter initially produces zero output, preserving the base model's behavior at the start of training.

Usage

Configure LoRA whenever you want to:

  • Fine-tune a large model with a fraction of the parameters (typically 0.1-1% of total)
  • Control the expressiveness-efficiency trade-off via rank and alpha settings
  • Target specific layers for adaptation based on task requirements
  • Set up QLoRA by combining LoRA configuration with a quantized base model

Theoretical Basis

LoRA is based on the hypothesis that the weight updates during fine-tuning have a low intrinsic rank. For a pretrained weight matrix W of dimension d x k, the fine-tuned weight can be expressed as:

W' = W + delta_W = W + B * A

where:

  • A is a matrix of dimension r x k (the "down-projection")
  • B is a matrix of dimension d x r (the "up-projection")
  • r << min(d, k) is the rank, making the total number of added parameters r * (d + k) instead of d * k

The scaling factor is applied as:

W' = W + (alpha / r) * B * A

At initialization, A is drawn from a Gaussian distribution and B is initialized to zero, so the initial adapter contribution B * A = 0. This ensures the model begins training from the pretrained checkpoint's behavior.

The rank r controls the expressiveness of the adaptation:

  • r = 1: Minimal adaptation, rank-1 update (fewest parameters)
  • r = rank(delta_W): Full-rank update, equivalent to standard fine-tuning of that layer
  • In practice, r between 4 and 64 captures most of the task-specific adaptation

The total trainable parameters for LoRA applied to L layers with weight matrices of dimension d x k is:

Total trainable = L * r * (d + k)

For a 7B parameter model with rank 16 applied to all attention projections, this typically results in approximately 10-40 million trainable parameters (roughly 0.1-0.5% of total).

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment