Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:OpenGVLab InternVL Load Model And Tokenizer

From Leeroopedia


Knowledge Sources
Domains Inference, Model_Deployment
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for loading InternVL models for evaluation and inference with multi-GPU device mapping provided by the InternVL evaluation framework.

Description

The load_model_and_tokenizer function loads an InternVLChatModel and its tokenizer for inference. It supports:

  • Multi-GPU device mapping via split_model for distributing layers across GPUs
  • 4-bit and 8-bit quantization via BitsAndBytes
  • Auto device mapping for single-GPU inference
  • Standard single-GPU loading on the current rank's device

The companion split_model function computes the per-GPU layer allocation.

Usage

Used by all evaluation scripts (evaluate_vqa.py, evaluate_mantis.py, etc.) to load the model before running inference.

Code Reference

Source Location

  • Repository: InternVL
  • File: internvl_chat/internvl/model/__init__.py
  • Lines: L14-51

Signature

def split_model(num_layers, vit_alpha=0.5):
    """
    Compute device_map distributing LLM layers across GPUs.

    Args:
        num_layers: int - Total number of LLM transformer layers
        vit_alpha: float - Fraction of GPU 0 reserved for ViT (default 0.5)

    Returns:
        dict - Device map mapping module names to GPU indices
    """

def load_model_and_tokenizer(args):
    """
    Load InternVLChatModel and tokenizer for inference.

    Args:
        args: Namespace with attributes:
            checkpoint: str - Model path
            auto: bool - Use auto device mapping (single GPU)
            load_in_8bit: bool - 8-bit quantization
            load_in_4bit: bool - 4-bit quantization

    Returns:
        Tuple[InternVLChatModel, AutoTokenizer]
    """

Import

from internvl.model import load_model_and_tokenizer, split_model

I/O Contract

Inputs

Name Type Required Description
args.checkpoint str Yes Path to model checkpoint
args.auto bool No Enable auto device mapping for single GPU
args.load_in_8bit bool No 8-bit quantization
args.load_in_4bit bool No 4-bit quantization

Outputs

Name Type Description
model InternVLChatModel Model in eval mode distributed across GPUs
tokenizer AutoTokenizer Tokenizer loaded from checkpoint

Usage Examples

Multi-GPU Inference Loading

import argparse
from internvl.model import load_model_and_tokenizer

args = argparse.Namespace(
    checkpoint='OpenGVLab/InternVL2_5-8B',
    auto=False,
    load_in_8bit=False,
    load_in_4bit=False,
)

model, tokenizer = load_model_and_tokenizer(args)
# Model distributed across available GPUs in eval mode

Single-GPU with Auto Mapping

args = argparse.Namespace(
    checkpoint='OpenGVLab/InternVL2_5-8B',
    auto=True,
    load_in_8bit=False,
    load_in_4bit=False,
)

model, tokenizer = load_model_and_tokenizer(args)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment