Principle:LLMBook zh LLMBook zh github io Low Rank Adaptation
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Parameter_Efficient_Finetuning |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
A parameter-efficient fine-tuning technique that adds trainable low-rank decomposition matrices to frozen pre-trained model layers.
Description
Low-Rank Adaptation (LoRA) addresses the prohibitive cost of full fine-tuning for large language models by freezing the pre-trained weights and injecting small trainable matrices. For each targeted linear layer with weight matrix W, LoRA adds a parallel path BA where A is a down-projection and B is an up-projection, both with rank r much smaller than the layer dimensions. The output becomes Wx + BAx, adding only 2 × r × d parameters per layer instead of d × d.
This enables fine-tuning models with billions of parameters on consumer hardware while maintaining performance comparable to full fine-tuning.
Usage
Use LoRA when fine-tuning large language models with limited GPU memory or when you need multiple task-specific adapters that can be swapped without duplicating the base model. Common choices are r=8 or r=16 for the rank, with LoRA applied to attention projection layers.
Theoretical Basis
For a pre-trained weight matrix , LoRA parameterizes the update as:
where and , with rank .
The forward pass becomes:
Initialization:
- A is initialized with a small normal distribution (std=0.02).
- B is initialized to zero, ensuring the LoRA path starts as a no-op.
Pseudo-code:
# Abstract LoRA computation (NOT real implementation)
original_output = W @ x + bias
lora_output = B(A(dropout(x)))
output = original_output + lora_output