Implementation:Haotian liu LLaVA Load Pretrained Model

Overview

Unified model loading function that handles all LLaVA model variants, architectures, and adapter configurations through a single entry point.

Source

File: llava/model/builder.py
Lines: L26-167

Signature

def load_pretrained_model(
    model_path: str,
    model_base: str,
    model_name: str,
    load_8bit: bool = False,
    load_4bit: bool = False,
    device_map: str = "auto",
    device: str = "cuda",
    use_flash_attn: bool = False,
    **kwargs
) -> Tuple[AutoTokenizer, LlavaLlamaForCausalLM, CLIPImageProcessor, int]:
    """
    Load a LLaVA model with automatic variant detection.

    Returns:
        Tuple of (tokenizer, model, image_processor, context_len)
    """

Import

from llava.model.builder import load_pretrained_model

Inputs

Parameter	Type	Required	Default	Description
`model_path`	str	Yes	--	HuggingFace model ID or local checkpoint path
`model_base`	str	For LoRA/projector	`None`	Base model path (required for LoRA adapters and projector-only checkpoints)
`model_name`	str	Yes	--	Model name string used for architecture detection (e.g., `'llava-v1.5-13b'`)
`load_8bit`	bool	No	`False`	Enable 8-bit quantization via BitsAndBytes
`load_4bit`	bool	No	`False`	Enable 4-bit NF4 quantization via BitsAndBytes
`device_map`	str	No	`"auto"`	Device mapping strategy for model parallelism
`device`	str	No	`"cuda"`	Target device for model loading
`use_flash_attn`	bool	No	`False`	Enable Flash Attention 2 for faster inference

Outputs

Output	Type	Description
`tokenizer`	`AutoTokenizer`	Tokenizer for the loaded model
`model`	`LlavaLlamaForCausalLM`	Loaded model (or `LlavaMistralForCausalLM`, `LlavaMptForCausalLM`)
`image_processor`	`CLIPImageProcessor`	CLIP image preprocessor from the vision tower
`context_len`	`int`	Maximum context length for the model

Usage Examples

Standard model

from llava.model.builder import load_pretrained_model

tokenizer, model, image_processor, context_len = load_pretrained_model(
    model_path="liuhaotian/llava-v1.5-13b",
    model_base=None,
    model_name="llava-v1.5-13b"
)

LoRA model

tokenizer, model, image_processor, context_len = load_pretrained_model(
    model_path="/path/to/llava-v1.5-13b-lora",
    model_base="liuhaotian/llava-v1.5-13b",
    model_name="llava-v1.5-13b-lora"
)

4-bit quantized

tokenizer, model, image_processor, context_len = load_pretrained_model(
    model_path="liuhaotian/llava-v1.5-13b",
    model_base=None,
    model_name="llava-v1.5-13b",
    load_4bit=True
)

Description

load_pretrained_model() is the single entry point for loading any LLaVA model variant. It follows a decision tree based on the model name:

Decision flow:

Check for quantization -- If load_4bit or load_8bit, configure BitsAndBytesConfig.
Detect model type:
- If 'llava' in model name and model_base provided:
  - If 'lora' in model name -- Load base model, apply LoRA adapters, merge and unload
  - Otherwise -- Load base model, replace projector weights from checkpoint
- If 'llava' in model name and no model_base -- Load full LLaVA model directly
- Otherwise -- Load as plain language model (no vision components)
Detect architecture from model name: LLaMA (default), Mistral ('mistral'), MPT ('mpt')
Initialize vision tower -- Call model.get_vision_tower() and load CLIP weights if not already loaded
Set model to eval mode and return the full inference stack

LoRA merge process:

Loads non_lora_trainables.bin from the adapter checkpoint (contains projector weights)
Applies LoRA adapters via PeftModel.from_pretrained()
Calls merge_and_unload() to fold LoRA weights into the base model for efficient inference

Metadata

Field	Value
Knowledge Sources	Repo - LLaVA - https://github.com/haotian-liu/LLaVA
Domains	Model_Management, Inference
Last Updated	2026-02-13 14:00 GMT

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment