Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Haotian liu LLaVA Model Loading

From Leeroopedia

Overview

Unified procedure for loading pre-trained vision-language models with automatic detection of model type, architecture, and adapter configuration.

Description

Model loading in LLaVA handles multiple model variants through a single entry point. The loading procedure automatically detects and handles the following dimensions:

  1. Multimodal vs. plain language model -- Detects whether the model is a LLaVA multimodal model (contains 'llava' in the name) or a plain language model.
  2. Language model architecture -- Identifies the underlying LLM architecture from the model name:
    • LLaMA -- Default architecture
    • Mistral -- Detected by 'mistral' in model name
    • MPT -- Detected by 'mpt' in model name
  3. LoRA adapter detection -- If 'lora' is present in the model name, loads the base model first, then applies and merges LoRA adapter weights.
  4. Projector-only checkpoint -- If a model_base is provided without LoRA indicators, loads the base model and replaces the multimodal projector weights.
  5. Quantization settings -- Supports 4-bit NF4 quantization and 8-bit quantization via BitsAndBytesConfig.

After loading the language model and any adapters, the procedure:

  • Initializes the vision tower (CLIP ViT-L/14) if not already loaded
  • Returns the complete inference stack: tokenizer, model, image_processor, context_len

Usage

Use as the primary model loading function for any LLaVA inference or evaluation task. This single function handles all model variants automatically, eliminating the need for variant-specific loading code.

Common scenarios:

  • Standard model -- Provide only model_path
  • LoRA model -- Provide model_path (adapter) + model_base (base model)
  • Quantized inference -- Add load_4bit=True or load_8bit=True

Theoretical Basis

Model type detection uses string matching on model_name:

  • 'llava' in name -- multimodal model path
  • 'lora' in name -- LoRA adapter mode
  • 'mpt' in name -- MPT architecture selection
  • 'mistral' in name -- Mistral architecture selection

LoRA loading sequence:

  1. Load the base model with full precision (or quantized)
  2. Load non_lora_trainables.bin (additional non-LoRA weights like the projector)
  3. Apply LoRA adapters via PeftModel.from_pretrained()
  4. Merge and unload LoRA weights via merge_and_unload() for inference efficiency

Quantization: 4-bit uses NF4 (Normal Float 4) with double quantization and bfloat16 compute type. 8-bit uses the default BitsAndBytesConfig 8-bit configuration.

Metadata

Field Value
Knowledge Sources Repo - LLaVA - https://github.com/haotian-liu/LLaVA
Domains Model_Management, Inference
Last Updated 2026-02-13 14:00 GMT

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment