Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:OpenGVLab InternVL LLaVA Model Builder

From Leeroopedia


Knowledge Sources
Domains Model_Loading, LoRA, Quantization
Last Updated 2026-02-07 14:00 GMT

Overview

This module provides the central model loading function that handles all LLaVA model variants, including full checkpoints, LoRA adapters, projector-only weights, and quantized configurations.

Description

The builder.py module contains the load_pretrained_model function, which serves as the single entry point for loading any LLaVA model configuration. The function implements a multi-branch routing strategy based on the model name and provided arguments:

Quantization support:

  • 8-bit loading via load_in_8bit=True
  • 4-bit loading via BitsAndBytesConfig with NF4 quantization, double quantization, and float16 compute dtype
  • FP16 default when neither quantization flag is set

LLaVA model loading (when "llava" or "intern" in model name):

  • LoRA models: Loads the base model with LoRA config, loads non-LoRA trainables (from local file or HuggingFace Hub), applies PEFT adapter, then merges and unloads LoRA weights. Handles weight key prefix stripping for compatibility.
  • Projector-only models: Loads the base model (LLaMA or MPT variant) with the fine-tuned config, then loads only the mm_projector.bin weights
  • Full checkpoints: Loads LlavaLlamaForCausalLM or LlavaMptForCausalLM directly from the model path

Language-only model loading:

  • PEFT models: Loads base model + LoRA adapter, merges, and converts to FP16
  • Standard models: Direct loading via AutoModelForCausalLM

Post-loading initialization:

  • Adds special image tokens (DEFAULT_IMAGE_PATCH_TOKEN, start/end tokens) to the tokenizer
  • Resizes token embeddings to match
  • Loads and initializes the vision tower (CLIP encoder) and extracts the image processor
  • Determines context length from config (default: 2048)

Usage

Use this function as the single entry point for loading LLaVA models in any evaluation or inference script. It handles all model variants and returns a consistent (tokenizer, model, image_processor, context_len) tuple.

Code Reference

Source Location

Signature

def load_pretrained_model(
    model_path: str,
    model_base: str,
    model_name: str,
    load_8bit: bool = False,
    load_4bit: bool = False,
    device_map: str = "auto",
    device: str = "cuda"
) -> tuple:  # (tokenizer, model, image_processor, context_len)

Import

from llava.model.builder import load_pretrained_model

I/O Contract

Inputs

Name Type Required Description
model_path str Yes Path to model checkpoint directory or HuggingFace model ID
model_base str No Base model path for LoRA or projector-only models (None for full checkpoints)
model_name str Yes Model name string used for routing (checked for "llava", "intern", "lora", "mpt")
load_8bit bool No Enable 8-bit quantization (default: False)
load_4bit bool No Enable 4-bit NF4 quantization (default: False)
device_map str No Device mapping strategy (default: "auto")
device str No Target device for vision tower (default: "cuda")

Outputs

Name Type Description
tokenizer AutoTokenizer Configured tokenizer with special image tokens added
model LlavaLlamaForCausalLM or AutoModelForCausalLM Loaded model (with LoRA merged if applicable)
image_processor object or None Vision tower's image processor (None for language-only models)
context_len int Maximum sequence length from config (default: 2048)

Usage Examples

Basic Usage

from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path

model_path = "/path/to/llava-v1.5-7b"
model_name = get_model_name_from_path(model_path)
tokenizer, model, image_processor, context_len = load_pretrained_model(
    model_path, model_base=None, model_name=model_name
)

# Load a LoRA model
tokenizer, model, image_processor, context_len = load_pretrained_model(
    model_path="/path/to/llava-lora-weights",
    model_base="/path/to/llama-base",
    model_name="llava-lora-v1.5",
    load_4bit=True
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment