Principle:Huggingface Transformers LoRA Configuration

Knowledge Sources	LoRA QLoRA PEFT Docs Transformers Docs
Domains	Parameter_Efficient_Fine_Tuning, NLP, Low_Rank_Adaptation
Last Updated	2026-02-13 00:00 GMT

Overview

LoRA configuration defines the hyperparameters that control how low-rank adaptation matrices are constructed and injected into a pretrained model, determining the trade-off between parameter efficiency and task performance.

Description

Low-Rank Adaptation (LoRA) is the most widely adopted PEFT method. Rather than fine-tuning all parameters of a pretrained model, LoRA freezes the original weights and injects small trainable low-rank decomposition matrices into selected layers. The configuration object specifies exactly how these matrices are constructed.

Key configuration decisions include:

Rank (r): The inner dimension of the low-rank matrices. Lower rank means fewer trainable parameters but potentially reduced expressiveness. Typical values range from 4 to 64, with 8 or 16 being common defaults.
Alpha (lora_alpha): A scaling factor that controls the magnitude of the adapter's contribution. The effective scaling is lora_alpha / r. Higher alpha amplifies the adapter's effect relative to the base model.
Target modules (target_modules): Which layers in the model receive LoRA adapters. Common targets include attention projection layers (q_proj, v_proj, k_proj, o_proj) and MLP layers. Setting this to None triggers auto-detection of linear layers.
Dropout (lora_dropout): Regularization applied to the LoRA layers during training to prevent overfitting.
Bias (bias): Whether to train bias terms alongside the LoRA matrices. Options are "none", "all", or "lora_only".
Task type (task_type): Informs PEFT of the model architecture type (e.g., CAUSAL_LM) for correct output layer handling.
Initialization (init_lora_weights): Controls how LoRA matrices are initialized. The default Kaiming initialization ensures that the adapter initially produces zero output, preserving the base model's behavior at the start of training.

Usage

Configure LoRA whenever you want to:

Fine-tune a large model with a fraction of the parameters (typically 0.1-1% of total)
Control the expressiveness-efficiency trade-off via rank and alpha settings
Target specific layers for adaptation based on task requirements
Set up QLoRA by combining LoRA configuration with a quantized base model

Theoretical Basis

LoRA is based on the hypothesis that the weight updates during fine-tuning have a low intrinsic rank. For a pretrained weight matrix W of dimension d x k, the fine-tuned weight can be expressed as:

W' = W + delta_W = W + B * A

where:

A is a matrix of dimension r x k (the "down-projection")
B is a matrix of dimension d x r (the "up-projection")
r << min(d, k) is the rank, making the total number of added parameters r * (d + k) instead of d * k

The scaling factor is applied as:

W' = W + (alpha / r) * B * A

At initialization, A is drawn from a Gaussian distribution and B is initialized to zero, so the initial adapter contribution B * A = 0. This ensures the model begins training from the pretrained checkpoint's behavior.

The rank r controls the expressiveness of the adaptation:

r = 1: Minimal adaptation, rank-1 update (fewest parameters)
r = rank(delta_W): Full-rank update, equivalent to standard fine-tuning of that layer
In practice, r between 4 and 64 captures most of the task-specific adaptation

The total trainable parameters for LoRA applied to L layers with weight matrices of dimension d x k is:

Total trainable = L * r * (d + k)

For a 7B parameter model with rank 16 applied to all attention projections, this typically results in approximately 10-40 million trainable parameters (roughly 0.1-0.5% of total).

Related Pages

Implemented By

Implementation:Huggingface_Transformers_LoraConfig

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment