Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Haotian liu LLaVA LoRA Configuration

From Leeroopedia

Overview

Parameter-efficient fine-tuning technique that adds small trainable rank-decomposition matrices to frozen model weights.

Description

Low-Rank Adaptation (LoRA) freezes the original model weights and injects trainable rank-decomposition matrices into specified layers. In LLaVA, LoRA adapters are applied to all nn.Linear layers except mm_projector, vision_tower, vision_resampler, and lm_head (auto-detected via find_all_linear_names()). QLoRA extends this with 4-bit NF4 quantization of the base model via the bitsandbytes library.

The configuration is managed through LLaVA's extended TrainingArguments dataclass, which adds LoRA-specific fields to HuggingFace's standard training arguments. These fields control adapter rank, scaling factor, dropout, quantization precision, and bias handling.

Usage

Use LoRA when you want to fine-tune LLaVA for a custom task with limited GPU memory. Use QLoRA (bits=4) when GPU memory is extremely constrained. LLaVA v1.5 uses r=128, alpha=256 for LoRA fine-tuning, as specified in scripts/v1_5/finetune_lora.sh.

Configuration When to Use Memory Requirement
LoRA (bits=16) Standard parameter-efficient finetuning ~1/3 of full finetuning
QLoRA (bits=4) Extremely memory-constrained environments ~1/6 of full finetuning
Full finetuning Sufficient data and compute available Full model size in memory

Theoretical Basis

For a weight matrix W in R^(d x k), LoRA adds:

Delta W = B * A

where B in R^(d x r), A in R^(r x k), and r << min(d, k).

The forward pass becomes:

h = W * x + (B * A) * x * (alpha / r)

Only A and B matrices are trained. The alpha/r ratio controls the scaling of the adapter contribution. With r=128 and alpha=256, the effective scaling factor is 2.0.

The find_all_linear_names() function auto-discovers target modules by iterating model.named_modules() and collecting all nn.Linear layer names, excluding components that should not receive adapters:

  • mm_projector -- The multimodal projector (trained separately with its own learning rate)
  • vision_tower -- The CLIP vision encoder (frozen)
  • vision_resampler -- The vision resampler module (if present)
  • lm_head -- The language model output head (excluded for stability)

Knowledge Sources

Domains

  • Parameter_Efficient_Fine_Tuning
  • Model_Adaptation

Metadata

Field Value
last_updated 2026-02-13 14:00 GMT
source_repo Haotian_liu_LLaVA
commit 799f5f207c89
type Principle

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment