Principle:Haotian liu LLaVA Model Loading

Overview

Unified procedure for loading pre-trained vision-language models with automatic detection of model type, architecture, and adapter configuration.

Description

Model loading in LLaVA handles multiple model variants through a single entry point. The loading procedure automatically detects and handles the following dimensions:

Multimodal vs. plain language model -- Detects whether the model is a LLaVA multimodal model (contains 'llava' in the name) or a plain language model.
Language model architecture -- Identifies the underlying LLM architecture from the model name:
- LLaMA -- Default architecture
- Mistral -- Detected by 'mistral' in model name
- MPT -- Detected by 'mpt' in model name
LoRA adapter detection -- If 'lora' is present in the model name, loads the base model first, then applies and merges LoRA adapter weights.
Projector-only checkpoint -- If a model_base is provided without LoRA indicators, loads the base model and replaces the multimodal projector weights.
Quantization settings -- Supports 4-bit NF4 quantization and 8-bit quantization via BitsAndBytesConfig.

After loading the language model and any adapters, the procedure:

Initializes the vision tower (CLIP ViT-L/14) if not already loaded
Returns the complete inference stack: tokenizer, model, image_processor, context_len

Usage

Use as the primary model loading function for any LLaVA inference or evaluation task. This single function handles all model variants automatically, eliminating the need for variant-specific loading code.

Common scenarios:

Standard model -- Provide only model_path
LoRA model -- Provide model_path (adapter) + model_base (base model)
Quantized inference -- Add load_4bit=True or load_8bit=True

Theoretical Basis

Model type detection uses string matching on model_name:

'llava' in name -- multimodal model path
'lora' in name -- LoRA adapter mode
'mpt' in name -- MPT architecture selection
'mistral' in name -- Mistral architecture selection

LoRA loading sequence:

Load the base model with full precision (or quantized)
Load non_lora_trainables.bin (additional non-LoRA weights like the projector)
Apply LoRA adapters via PeftModel.from_pretrained()
Merge and unload LoRA weights via merge_and_unload() for inference efficiency

Quantization: 4-bit uses NF4 (Normal Float 4) with double quantization and bfloat16 compute type. 8-bit uses the default BitsAndBytesConfig 8-bit configuration.

Metadata

Field	Value
Knowledge Sources	Repo - LLaVA - https://github.com/haotian-liu/LLaVA
Domains	Model_Management, Inference
Last Updated	2026-02-13 14:00 GMT

Related Pages

Implementation:Haotian_liu_LLaVA_Load_Pretrained_Model

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment