Heuristic:OpenGVLab InternVL LoRA Alpha Scaling
| Knowledge Sources | |
|---|---|
| Domains | Parameter_Efficient_Finetuning, Optimization |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
LoRA alpha scaling rule: always set `lora_alpha = 2 * rank` for stable LoRA fine-tuning in InternVL models.
Description
When injecting LoRA adapters into InternVL models (either the vision encoder or the language model), the codebase enforces a fixed relationship between the LoRA rank and the alpha scaling parameter. The alpha value is always set to twice the rank value, regardless of the specific rank chosen. This 2x scaling factor controls the effective learning rate of the LoRA update: `delta_W = (alpha / r) * BA`, meaning that with `alpha = 2r`, the effective multiplier on the low-rank update is always 2.
Usage
Apply this heuristic whenever configuring LoRA fine-tuning for InternVL models. The convention is hardcoded in the training scripts, so users only need to set the `--use_backbone_lora` or `--use_llm_lora` argument to the desired rank value; the alpha is computed automatically.
The Insight (Rule of Thumb)
- Action: Set `lora_alpha = 2 * lora_rank` when configuring LoRA adapters.
- Value: For rank=128, use alpha=256. For rank=16, use alpha=32.
- Trade-off: Higher alpha increases the magnitude of the LoRA update. The 2x convention provides a balance between adaptation strength and stability.
Reasoning
The LoRA effective update is `delta_W = (alpha / r) * BA`, where B and A are the low-rank matrices. With `alpha = 2r`, the multiplier is always 2, providing a consistent adaptation magnitude regardless of rank. This convention is used across all InternVL training scripts and model initialization code, suggesting it was empirically validated during model development. It provides stronger adaptation than the common convention of `alpha = r` (multiplier = 1) used in some other frameworks.
Code Evidence
From `internvl_chat_finetune.py:1003-1008`:
if model_args.use_backbone_lora:
model.wrap_backbone_lora(r=model_args.use_backbone_lora,
lora_alpha=2 * model_args.use_backbone_lora)
model.config.use_backbone_lora = model_args.use_backbone_lora
if model_args.use_llm_lora:
model.wrap_llm_lora(r=model_args.use_llm_lora,
lora_alpha=2 * model_args.use_llm_lora)
model.config.use_llm_lora = model_args.use_llm_lora
Same pattern in `modeling_internvl_chat.py:104-108`:
if config.use_backbone_lora:
self.wrap_backbone_lora(r=config.use_backbone_lora,
lora_alpha=2 * config.use_backbone_lora)
if config.use_llm_lora:
self.wrap_llm_lora(r=config.use_llm_lora,
lora_alpha=2 * config.use_llm_lora)