Implementation:OpenGVLab InternVL LLaVA Model Builder
| Knowledge Sources | |
|---|---|
| Domains | Model_Loading, LoRA, Quantization |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
This module provides the central model loading function that handles all LLaVA model variants, including full checkpoints, LoRA adapters, projector-only weights, and quantized configurations.
Description
The builder.py module contains the load_pretrained_model function, which serves as the single entry point for loading any LLaVA model configuration. The function implements a multi-branch routing strategy based on the model name and provided arguments:
Quantization support:
- 8-bit loading via
load_in_8bit=True - 4-bit loading via BitsAndBytesConfig with NF4 quantization, double quantization, and float16 compute dtype
- FP16 default when neither quantization flag is set
LLaVA model loading (when "llava" or "intern" in model name):
- LoRA models: Loads the base model with LoRA config, loads non-LoRA trainables (from local file or HuggingFace Hub), applies PEFT adapter, then merges and unloads LoRA weights. Handles weight key prefix stripping for compatibility.
- Projector-only models: Loads the base model (LLaMA or MPT variant) with the fine-tuned config, then loads only the
mm_projector.binweights - Full checkpoints: Loads LlavaLlamaForCausalLM or LlavaMptForCausalLM directly from the model path
Language-only model loading:
- PEFT models: Loads base model + LoRA adapter, merges, and converts to FP16
- Standard models: Direct loading via AutoModelForCausalLM
Post-loading initialization:
- Adds special image tokens (
DEFAULT_IMAGE_PATCH_TOKEN, start/end tokens) to the tokenizer - Resizes token embeddings to match
- Loads and initializes the vision tower (CLIP encoder) and extracts the image processor
- Determines context length from config (default: 2048)
Usage
Use this function as the single entry point for loading LLaVA models in any evaluation or inference script. It handles all model variants and returns a consistent (tokenizer, model, image_processor, context_len) tuple.
Code Reference
Source Location
- Repository: OpenGVLab_InternVL
- File: internvl_chat_llava/llava/model/builder.py
- Lines: 1-148
Signature
def load_pretrained_model(
model_path: str,
model_base: str,
model_name: str,
load_8bit: bool = False,
load_4bit: bool = False,
device_map: str = "auto",
device: str = "cuda"
) -> tuple: # (tokenizer, model, image_processor, context_len)
Import
from llava.model.builder import load_pretrained_model
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_path | str | Yes | Path to model checkpoint directory or HuggingFace model ID |
| model_base | str | No | Base model path for LoRA or projector-only models (None for full checkpoints) |
| model_name | str | Yes | Model name string used for routing (checked for "llava", "intern", "lora", "mpt") |
| load_8bit | bool | No | Enable 8-bit quantization (default: False) |
| load_4bit | bool | No | Enable 4-bit NF4 quantization (default: False) |
| device_map | str | No | Device mapping strategy (default: "auto") |
| device | str | No | Target device for vision tower (default: "cuda") |
Outputs
| Name | Type | Description |
|---|---|---|
| tokenizer | AutoTokenizer | Configured tokenizer with special image tokens added |
| model | LlavaLlamaForCausalLM or AutoModelForCausalLM | Loaded model (with LoRA merged if applicable) |
| image_processor | object or None | Vision tower's image processor (None for language-only models) |
| context_len | int | Maximum sequence length from config (default: 2048) |
Usage Examples
Basic Usage
from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
model_path = "/path/to/llava-v1.5-7b"
model_name = get_model_name_from_path(model_path)
tokenizer, model, image_processor, context_len = load_pretrained_model(
model_path, model_base=None, model_name=model_name
)
# Load a LoRA model
tokenizer, model, image_processor, context_len = load_pretrained_model(
model_path="/path/to/llava-lora-weights",
model_base="/path/to/llama-base",
model_name="llava-lora-v1.5",
load_4bit=True
)