Principle:Haotian liu LLaVA LoRA Configuration

Overview

Parameter-efficient fine-tuning technique that adds small trainable rank-decomposition matrices to frozen model weights.

Description

Low-Rank Adaptation (LoRA) freezes the original model weights and injects trainable rank-decomposition matrices into specified layers. In LLaVA, LoRA adapters are applied to all nn.Linear layers except mm_projector, vision_tower, vision_resampler, and lm_head (auto-detected via find_all_linear_names()). QLoRA extends this with 4-bit NF4 quantization of the base model via the bitsandbytes library.

The configuration is managed through LLaVA's extended TrainingArguments dataclass, which adds LoRA-specific fields to HuggingFace's standard training arguments. These fields control adapter rank, scaling factor, dropout, quantization precision, and bias handling.

Usage

Use LoRA when you want to fine-tune LLaVA for a custom task with limited GPU memory. Use QLoRA (bits=4) when GPU memory is extremely constrained. LLaVA v1.5 uses r=128, alpha=256 for LoRA fine-tuning, as specified in scripts/v1_5/finetune_lora.sh.

Configuration	When to Use	Memory Requirement
LoRA (bits=16)	Standard parameter-efficient finetuning	~1/3 of full finetuning
QLoRA (bits=4)	Extremely memory-constrained environments	~1/6 of full finetuning
Full finetuning	Sufficient data and compute available	Full model size in memory

Theoretical Basis

For a weight matrix W in R^(d x k), LoRA adds:

Delta W = B * A

where B in R^(d x r), A in R^(r x k), and r << min(d, k).

The forward pass becomes:

h = W * x + (B * A) * x * (alpha / r)

Only A and B matrices are trained. The alpha/r ratio controls the scaling of the adapter contribution. With r=128 and alpha=256, the effective scaling factor is 2.0.

The find_all_linear_names() function auto-discovers target modules by iterating model.named_modules() and collecting all nn.Linear layer names, excluding components that should not receive adapters:

mm_projector -- The multimodal projector (trained separately with its own learning rate)
vision_tower -- The CLIP vision encoder (frozen)
vision_resampler -- The vision resampler module (if present)
lm_head -- The language model output head (excluded for stability)

Knowledge Sources

Paper -- LoRA: Low-Rank Adaptation of Large Language Models -- https://arxiv.org/abs/2106.09685
Paper -- QLoRA: Efficient Finetuning of Quantized LLMs -- https://arxiv.org/abs/2305.14314

Domains

Parameter_Efficient_Fine_Tuning
Model_Adaptation

Metadata

Field	Value
last_updated	2026-02-13 14:00 GMT
source_repo	Haotian_liu_LLaVA
commit	799f5f207c89
type	Principle

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment