Implementation:OpenGVLab InternVL Load Model And Tokenizer
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Inference, Model_Deployment |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for loading InternVL models for evaluation and inference with multi-GPU device mapping provided by the InternVL evaluation framework.
Description
The load_model_and_tokenizer function loads an InternVLChatModel and its tokenizer for inference. It supports:
- Multi-GPU device mapping via split_model for distributing layers across GPUs
- 4-bit and 8-bit quantization via BitsAndBytes
- Auto device mapping for single-GPU inference
- Standard single-GPU loading on the current rank's device
The companion split_model function computes the per-GPU layer allocation.
Usage
Used by all evaluation scripts (evaluate_vqa.py, evaluate_mantis.py, etc.) to load the model before running inference.
Code Reference
Source Location
- Repository: InternVL
- File: internvl_chat/internvl/model/__init__.py
- Lines: L14-51
Signature
def split_model(num_layers, vit_alpha=0.5):
"""
Compute device_map distributing LLM layers across GPUs.
Args:
num_layers: int - Total number of LLM transformer layers
vit_alpha: float - Fraction of GPU 0 reserved for ViT (default 0.5)
Returns:
dict - Device map mapping module names to GPU indices
"""
def load_model_and_tokenizer(args):
"""
Load InternVLChatModel and tokenizer for inference.
Args:
args: Namespace with attributes:
checkpoint: str - Model path
auto: bool - Use auto device mapping (single GPU)
load_in_8bit: bool - 8-bit quantization
load_in_4bit: bool - 4-bit quantization
Returns:
Tuple[InternVLChatModel, AutoTokenizer]
"""
Import
from internvl.model import load_model_and_tokenizer, split_model
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| args.checkpoint | str | Yes | Path to model checkpoint |
| args.auto | bool | No | Enable auto device mapping for single GPU |
| args.load_in_8bit | bool | No | 8-bit quantization |
| args.load_in_4bit | bool | No | 4-bit quantization |
Outputs
| Name | Type | Description |
|---|---|---|
| model | InternVLChatModel | Model in eval mode distributed across GPUs |
| tokenizer | AutoTokenizer | Tokenizer loaded from checkpoint |
Usage Examples
Multi-GPU Inference Loading
import argparse
from internvl.model import load_model_and_tokenizer
args = argparse.Namespace(
checkpoint='OpenGVLab/InternVL2_5-8B',
auto=False,
load_in_8bit=False,
load_in_4bit=False,
)
model, tokenizer = load_model_and_tokenizer(args)
# Model distributed across available GPUs in eval mode
Single-GPU with Auto Mapping
args = argparse.Namespace(
checkpoint='OpenGVLab/InternVL2_5-8B',
auto=True,
load_in_8bit=False,
load_in_4bit=False,
)
model, tokenizer = load_model_and_tokenizer(args)
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment