Principle:Haotian liu LLaVA LoRA Configuration
Overview
Parameter-efficient fine-tuning technique that adds small trainable rank-decomposition matrices to frozen model weights.
Description
Low-Rank Adaptation (LoRA) freezes the original model weights and injects trainable rank-decomposition matrices into specified layers. In LLaVA, LoRA adapters are applied to all nn.Linear layers except mm_projector, vision_tower, vision_resampler, and lm_head (auto-detected via find_all_linear_names()). QLoRA extends this with 4-bit NF4 quantization of the base model via the bitsandbytes library.
The configuration is managed through LLaVA's extended TrainingArguments dataclass, which adds LoRA-specific fields to HuggingFace's standard training arguments. These fields control adapter rank, scaling factor, dropout, quantization precision, and bias handling.
Usage
Use LoRA when you want to fine-tune LLaVA for a custom task with limited GPU memory. Use QLoRA (bits=4) when GPU memory is extremely constrained. LLaVA v1.5 uses r=128, alpha=256 for LoRA fine-tuning, as specified in scripts/v1_5/finetune_lora.sh.
| Configuration | When to Use | Memory Requirement |
|---|---|---|
| LoRA (bits=16) | Standard parameter-efficient finetuning | ~1/3 of full finetuning |
| QLoRA (bits=4) | Extremely memory-constrained environments | ~1/6 of full finetuning |
| Full finetuning | Sufficient data and compute available | Full model size in memory |
Theoretical Basis
For a weight matrix W in R^(d x k), LoRA adds:
- Delta W = B * A
where B in R^(d x r), A in R^(r x k), and r << min(d, k).
The forward pass becomes:
- h = W * x + (B * A) * x * (alpha / r)
Only A and B matrices are trained. The alpha/r ratio controls the scaling of the adapter contribution. With r=128 and alpha=256, the effective scaling factor is 2.0.
The find_all_linear_names() function auto-discovers target modules by iterating model.named_modules() and collecting all nn.Linear layer names, excluding components that should not receive adapters:
- mm_projector -- The multimodal projector (trained separately with its own learning rate)
- vision_tower -- The CLIP vision encoder (frozen)
- vision_resampler -- The vision resampler module (if present)
- lm_head -- The language model output head (excluded for stability)
Knowledge Sources
- Paper -- LoRA: Low-Rank Adaptation of Large Language Models -- https://arxiv.org/abs/2106.09685
- Paper -- QLoRA: Efficient Finetuning of Quantized LLMs -- https://arxiv.org/abs/2305.14314
Domains
- Parameter_Efficient_Fine_Tuning
- Model_Adaptation
Metadata
| Field | Value |
|---|---|
| last_updated | 2026-02-13 14:00 GMT |
| source_repo | Haotian_liu_LLaVA |
| commit | 799f5f207c89 |
| type | Principle |