Principle:Norrrrrrr lyn WAInjectBench LoRA Adapter Injection
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Parameter_Efficient_Finetuning |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
A parameter-efficient fine-tuning technique that injects low-rank adapter matrices into a frozen pre-trained model, enabling training with a fraction of the original parameter count.
Description
Low-Rank Adaptation (LoRA) adds trainable low-rank decomposition matrices and to selected weight matrices in a pre-trained model while keeping the original weights frozen. For a weight matrix , the modified forward pass becomes , where is the rank and is the scaling factor.
In the WAInjectBench LLaVA fine-tuning pipeline, LoRA is applied to a comprehensive set of target modules spanning both the language model and vision components: attention projections (q_proj, k_proj, v_proj, o_proj), MLP layers (gate_proj, up_proj, down_proj), and vision encoder layers (fc1, fc2, Wqkv, out_proj, proj, dense).
Usage
Use this when fine-tuning large models with limited GPU memory. LoRA reduces the trainable parameter count from billions to millions while maintaining most of the model's pre-trained knowledge.
Theoretical Basis
Where:
- is the frozen pre-trained weight
- and are the trainable low-rank matrices
- is the rank (typically 4-64)
- is the scaling factor
# LoRA injection pseudocode
config = LoraConfig(r=rank, lora_alpha=alpha, target_modules=[...])
model = get_peft_model(model, config)
# Freeze all base params, enable only lora_* params