Implementation:Haotian liu LLaVA Load Pretrained Model
Appearance
Overview
Unified model loading function that handles all LLaVA model variants, architectures, and adapter configurations through a single entry point.
Source
- File:
llava/model/builder.py - Lines: L26-167
Signature
def load_pretrained_model(
model_path: str,
model_base: str,
model_name: str,
load_8bit: bool = False,
load_4bit: bool = False,
device_map: str = "auto",
device: str = "cuda",
use_flash_attn: bool = False,
**kwargs
) -> Tuple[AutoTokenizer, LlavaLlamaForCausalLM, CLIPImageProcessor, int]:
"""
Load a LLaVA model with automatic variant detection.
Returns:
Tuple of (tokenizer, model, image_processor, context_len)
"""
Import
from llava.model.builder import load_pretrained_model
Inputs
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model_path |
str | Yes | -- | HuggingFace model ID or local checkpoint path |
model_base |
str | For LoRA/projector | None |
Base model path (required for LoRA adapters and projector-only checkpoints) |
model_name |
str | Yes | -- | Model name string used for architecture detection (e.g., 'llava-v1.5-13b')
|
load_8bit |
bool | No | False |
Enable 8-bit quantization via BitsAndBytes |
load_4bit |
bool | No | False |
Enable 4-bit NF4 quantization via BitsAndBytes |
device_map |
str | No | "auto" |
Device mapping strategy for model parallelism |
device |
str | No | "cuda" |
Target device for model loading |
use_flash_attn |
bool | No | False |
Enable Flash Attention 2 for faster inference |
Outputs
| Output | Type | Description |
|---|---|---|
tokenizer |
AutoTokenizer |
Tokenizer for the loaded model |
model |
LlavaLlamaForCausalLM |
Loaded model (or LlavaMistralForCausalLM, LlavaMptForCausalLM)
|
image_processor |
CLIPImageProcessor |
CLIP image preprocessor from the vision tower |
context_len |
int |
Maximum context length for the model |
Usage Examples
Standard model
from llava.model.builder import load_pretrained_model
tokenizer, model, image_processor, context_len = load_pretrained_model(
model_path="liuhaotian/llava-v1.5-13b",
model_base=None,
model_name="llava-v1.5-13b"
)
LoRA model
tokenizer, model, image_processor, context_len = load_pretrained_model(
model_path="/path/to/llava-v1.5-13b-lora",
model_base="liuhaotian/llava-v1.5-13b",
model_name="llava-v1.5-13b-lora"
)
4-bit quantized
tokenizer, model, image_processor, context_len = load_pretrained_model(
model_path="liuhaotian/llava-v1.5-13b",
model_base=None,
model_name="llava-v1.5-13b",
load_4bit=True
)
Description
load_pretrained_model() is the single entry point for loading any LLaVA model variant. It follows a decision tree based on the model name:
Decision flow:
- Check for quantization -- If
load_4bitorload_8bit, configureBitsAndBytesConfig. - Detect model type:
- If
'llava'in model name andmodel_baseprovided:- If
'lora'in model name -- Load base model, apply LoRA adapters, merge and unload - Otherwise -- Load base model, replace projector weights from checkpoint
- If
- If
'llava'in model name and nomodel_base-- Load full LLaVA model directly - Otherwise -- Load as plain language model (no vision components)
- If
- Detect architecture from model name: LLaMA (default), Mistral (
'mistral'), MPT ('mpt') - Initialize vision tower -- Call
model.get_vision_tower()and load CLIP weights if not already loaded - Set model to eval mode and return the full inference stack
LoRA merge process:
- Loads
non_lora_trainables.binfrom the adapter checkpoint (contains projector weights) - Applies LoRA adapters via
PeftModel.from_pretrained() - Calls
merge_and_unload()to fold LoRA weights into the base model for efficient inference
Metadata
| Field | Value |
|---|---|
| Knowledge Sources | Repo - LLaVA - https://github.com/haotian-liu/LLaVA |
| Domains | Model_Management, Inference |
| Last Updated | 2026-02-13 14:00 GMT |
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment