Principle:OpenGVLab InternVL LoRA Adapter Injection
| Knowledge Sources | |
|---|---|
| Domains | Parameter_Efficient_Finetuning, Deep_Learning, NLP |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A parameter-efficient fine-tuning technique that injects low-rank adapter matrices into pretrained model layers, enabling training with a fraction of the full parameter count.
Description
Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning method that freezes the pretrained model weights and injects trainable low-rank decomposition matrices into specific layers. Instead of updating the full weight matrix , LoRA adds a parallel path where and with rank .
In InternVL, LoRA can be applied to:
- Language model (LLM): Adapts attention and MLP layers of the LLM backbone
- Vision encoder (ViT): Adapts attention and MLP layers of InternViT (less common)
The target modules are automatically selected based on the LLM architecture:
- InternLM2: attention.wqkv, attention.wo, feed_forward.w1/w2/w3
- Qwen2/LLaMA: self_attn.q/k/v/o_proj, mlp.gate/down/up_proj
Usage
Use LoRA when fine-tuning InternVL on custom datasets with limited GPU memory, or when you want to maintain the base model weights unchanged for multiple task-specific adapters.
Theoretical Basis
The LoRA update rule:
Where:
- is the frozen pretrained weight
- , are trainable
- is the rank (typical: 16)
- is the scaling factor (convention in InternVL: )
The trainable parameter count for one LoRA layer: , compared to for full fine-tuning.
InternVL convention:
- lora_alpha = 2 * r (scaling factor)
- lora_dropout = 0.05 (dropout on LoRA path)
- All base model parameters frozen; only LoRA matrices and optionally MLP projector are trainable