Principle:Haotian liu LLaVA LoRA Training

Overview

Training procedure that applies LoRA adapters to a pre-trained LLaVA model and trains them on task-specific data.

Description

LoRA training in LLaVA uses the same train() function as full finetuning but with lora_enable=True. The procedure follows these steps:

Load the base LLaVA model (LlavaLlamaForCausalLM.from_pretrained())
If using QLoRA (bits=4 or 8), quantize the base model via BitsAndBytesConfig and prepare it with prepare_model_for_kbit_training()
Auto-detect target linear layers via find_all_linear_names()
Create a LoraConfig and apply LoRA adapters via get_peft_model()
Initialize vision modules, tokenizer, and data pipeline
Train with standard cross-entropy loss using LLaVATrainer
Save only LoRA adapter weights and non-LoRA trainables separately

Only LoRA adapter weights and non-LoRA trainables (mm_projector) are saved at checkpoint time. The custom LLaVATrainer handles adapter-aware checkpoint saving through get_peft_state_maybe_zero_3() and get_peft_state_non_lora_maybe_zero_3().

Usage

Use this when you want parameter-efficient finetuning of LLaVA on custom visual instruction data. Requires a pre-trained LLaVA checkpoint (or base LLM + pretrained mm_projector) as the starting point. This approach is recommended when:

You have limited task-specific data
GPU memory is constrained
You want to maintain multiple task-specific adapters sharing one base model

Theoretical Basis

After get_peft_model() wraps the model, only LoRA parameters (A and B matrices) have requires_grad=True. The base model weights remain frozen, and gradients flow only through the low-rank adapters.

At save time:

get_peft_state_maybe_zero_3() extracts only LoRA weights (parameters containing "lora_" in their name), handling DeepSpeed ZeRO-3 parameter gathering
get_peft_state_non_lora_maybe_zero_3() extracts non-LoRA trainable parameters (primarily mm_projector weights) into non_lora_trainables.bin
model.save_pretrained() saves the LoRA adapter configuration and weights (adapter_config.json, adapter_model.bin)

This enables efficient checkpoint storage: approximately 100MB for LoRA adapters vs ~26GB for a full 13B model checkpoint.

The mm_projector_lr parameter allows training the multimodal projector at a different (typically lower) learning rate than the LoRA adapters, providing fine-grained control over the adaptation of different model components.

Knowledge Sources

Paper -- LoRA: Low-Rank Adaptation of Large Language Models -- https://arxiv.org/abs/2106.09685
Repo -- LLaVA -- https://github.com/haotian-liu/LLaVA

Domains

Fine_Tuning
Parameter_Efficient_Fine_Tuning

Metadata

Field	Value
last_updated	2026-02-13 14:00 GMT
source_repo	Haotian_liu_LLaVA
commit	799f5f207c89
type	Principle

Related Pages

Implementation:Haotian_liu_LLaVA_Train_With_LoRA

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment