Implementation:OpenGVLab InternVL LLaVA Model Builder

Knowledge Sources	OpenGVLab_InternVL
Domains	Model_Loading, LoRA, Quantization
Last Updated	2026-02-07 14:00 GMT

Overview

This module provides the central model loading function that handles all LLaVA model variants, including full checkpoints, LoRA adapters, projector-only weights, and quantized configurations.

Description

The builder.py module contains the load_pretrained_model function, which serves as the single entry point for loading any LLaVA model configuration. The function implements a multi-branch routing strategy based on the model name and provided arguments:

Quantization support:

8-bit loading via load_in_8bit=True
4-bit loading via BitsAndBytesConfig with NF4 quantization, double quantization, and float16 compute dtype
FP16 default when neither quantization flag is set

LLaVA model loading (when "llava" or "intern" in model name):

LoRA models: Loads the base model with LoRA config, loads non-LoRA trainables (from local file or HuggingFace Hub), applies PEFT adapter, then merges and unloads LoRA weights. Handles weight key prefix stripping for compatibility.
Projector-only models: Loads the base model (LLaMA or MPT variant) with the fine-tuned config, then loads only the mm_projector.bin weights
Full checkpoints: Loads LlavaLlamaForCausalLM or LlavaMptForCausalLM directly from the model path

Language-only model loading:

PEFT models: Loads base model + LoRA adapter, merges, and converts to FP16
Standard models: Direct loading via AutoModelForCausalLM

Post-loading initialization:

Adds special image tokens (DEFAULT_IMAGE_PATCH_TOKEN, start/end tokens) to the tokenizer
Resizes token embeddings to match
Loads and initializes the vision tower (CLIP encoder) and extracts the image processor
Determines context length from config (default: 2048)

Usage

Use this function as the single entry point for loading LLaVA models in any evaluation or inference script. It handles all model variants and returns a consistent (tokenizer, model, image_processor, context_len) tuple.

Code Reference

Source Location

Repository: OpenGVLab_InternVL
File: internvl_chat_llava/llava/model/builder.py
Lines: 1-148

Signature

def load_pretrained_model(
    model_path: str,
    model_base: str,
    model_name: str,
    load_8bit: bool = False,
    load_4bit: bool = False,
    device_map: str = "auto",
    device: str = "cuda"
) -> tuple:  # (tokenizer, model, image_processor, context_len)

Import

from llava.model.builder import load_pretrained_model

I/O Contract

Inputs

Name	Type	Required	Description
model_path	str	Yes	Path to model checkpoint directory or HuggingFace model ID
model_base	str	No	Base model path for LoRA or projector-only models (None for full checkpoints)
model_name	str	Yes	Model name string used for routing (checked for "llava", "intern", "lora", "mpt")
load_8bit	bool	No	Enable 8-bit quantization (default: False)
load_4bit	bool	No	Enable 4-bit NF4 quantization (default: False)
device_map	str	No	Device mapping strategy (default: "auto")
device	str	No	Target device for vision tower (default: "cuda")

Outputs

Name	Type	Description
tokenizer	AutoTokenizer	Configured tokenizer with special image tokens added
model	LlavaLlamaForCausalLM or AutoModelForCausalLM	Loaded model (with LoRA merged if applicable)
image_processor	object or None	Vision tower's image processor (None for language-only models)
context_len	int	Maximum sequence length from config (default: 2048)

Usage Examples

Basic Usage

from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path

model_path = "/path/to/llava-v1.5-7b"
model_name = get_model_name_from_path(model_path)
tokenizer, model, image_processor, context_len = load_pretrained_model(
    model_path, model_base=None, model_name=model_name
)

# Load a LoRA model
tokenizer, model, image_processor, context_len = load_pretrained_model(
    model_path="/path/to/llava-lora-weights",
    model_base="/path/to/llama-base",
    model_name="llava-lora-v1.5",
    load_4bit=True
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment