Principle:Unslothai Unsloth LoRA Adapter Injection
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Parameter_Efficient_Finetuning, NLP |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A parameter-efficient fine-tuning technique that injects trainable low-rank decomposition matrices into frozen pretrained model layers, enabling adaptation with a fraction of the original parameter count.
Description
Low-Rank Adaptation (LoRA) addresses the prohibitive cost of full fine-tuning for large language models. Instead of updating all model parameters, LoRA freezes the pretrained weights and adds small trainable rank decomposition matrices to selected linear layers. For a weight matrix , LoRA adds where and with rank .
The key advantages are:
- Memory Efficiency: Only LoRA parameters require optimizer states and gradients, reducing training memory by 3-4x.
- Training Speed: Fewer trainable parameters means faster gradient computation.
- Composability: Multiple LoRA adapters can be trained independently and switched at inference time.
- Merge Capability: Trained LoRA weights can be merged back into the base model for deployment without inference overhead.
In the Unsloth context, LoRA injection also involves:
- Patching forward methods with fused LoRA MLP kernels
- Configuring Unsloth's optimized gradient checkpointing
- Attaching for_inference and for_training mode switching methods
Usage
Apply this principle immediately after loading a quantized model and before configuring the trainer. Target modules typically include all attention projections (q, k, v, o) and MLP layers (gate, up, down). The rank r controls the capacity-efficiency tradeoff: higher ranks (32-64) for complex tasks, lower ranks (8-16) for simpler adaptations.
Theoretical Basis
For a pretrained weight matrix , the adapted forward pass becomes:
Where:
- is initialized from a random Gaussian distribution
- is initialized to zero (so at start)
- is a scaling factor (lora_alpha) controlling adaptation magnitude
- is the rank (r parameter)
# Abstract LoRA forward pass
def lora_forward(x, W_frozen, A, B, alpha, r):
base_output = x @ W_frozen.T # Frozen pretrained computation
lora_output = x @ A.T @ B.T # Low-rank adaptation
return base_output + (alpha / r) * lora_output
The ratio acts as a learning rate modifier for the LoRA parameters. Setting means the LoRA update has the same magnitude as a full-rank update scaled by the learning rate.