Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Haotian liu LLaVA LoRA Training

From Leeroopedia

Overview

Training procedure that applies LoRA adapters to a pre-trained LLaVA model and trains them on task-specific data.

Description

LoRA training in LLaVA uses the same train() function as full finetuning but with lora_enable=True. The procedure follows these steps:

  1. Load the base LLaVA model (LlavaLlamaForCausalLM.from_pretrained())
  2. If using QLoRA (bits=4 or 8), quantize the base model via BitsAndBytesConfig and prepare it with prepare_model_for_kbit_training()
  3. Auto-detect target linear layers via find_all_linear_names()
  4. Create a LoraConfig and apply LoRA adapters via get_peft_model()
  5. Initialize vision modules, tokenizer, and data pipeline
  6. Train with standard cross-entropy loss using LLaVATrainer
  7. Save only LoRA adapter weights and non-LoRA trainables separately

Only LoRA adapter weights and non-LoRA trainables (mm_projector) are saved at checkpoint time. The custom LLaVATrainer handles adapter-aware checkpoint saving through get_peft_state_maybe_zero_3() and get_peft_state_non_lora_maybe_zero_3().

Usage

Use this when you want parameter-efficient finetuning of LLaVA on custom visual instruction data. Requires a pre-trained LLaVA checkpoint (or base LLM + pretrained mm_projector) as the starting point. This approach is recommended when:

  • You have limited task-specific data
  • GPU memory is constrained
  • You want to maintain multiple task-specific adapters sharing one base model

Theoretical Basis

After get_peft_model() wraps the model, only LoRA parameters (A and B matrices) have requires_grad=True. The base model weights remain frozen, and gradients flow only through the low-rank adapters.

At save time:

  • get_peft_state_maybe_zero_3() extracts only LoRA weights (parameters containing "lora_" in their name), handling DeepSpeed ZeRO-3 parameter gathering
  • get_peft_state_non_lora_maybe_zero_3() extracts non-LoRA trainable parameters (primarily mm_projector weights) into non_lora_trainables.bin
  • model.save_pretrained() saves the LoRA adapter configuration and weights (adapter_config.json, adapter_model.bin)

This enables efficient checkpoint storage: approximately 100MB for LoRA adapters vs ~26GB for a full 13B model checkpoint.

The mm_projector_lr parameter allows training the multimodal projector at a different (typically lower) learning rate than the LoRA adapters, providing fine-grained control over the adaptation of different model components.

Knowledge Sources

Domains

  • Fine_Tuning
  • Parameter_Efficient_Fine_Tuning

Metadata

Field Value
last_updated 2026-02-13 14:00 GMT
source_repo Haotian_liu_LLaVA
commit 799f5f207c89
type Principle

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment