Implementation:OpenGVLab InternVL Pretrain Main

Knowledge Sources	InternVL
Domains	Pretraining, Training
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for executing multi-stage pretraining of InternVL models provided by the InternVL training framework.

Description

The internvl_chat_pretrain.py script implements all three pretraining stages. It differs from the finetune script by:

Supporting Path B model loading (separate vision + LLM components)
Implementing rank-aware data distribution optimized for large-scale training
Always using packed training mode for efficient GPU utilization
Supporting stage-specific freeze configurations via command-line arguments

The same script is invoked for all three stages with different configuration arguments.

Usage

Launched three times sequentially with different shell scripts for Stage 1, 1.5, and 2. Each stage loads the previous stage's output checkpoint.

Code Reference

Source Location

Repository: InternVL
File: internvl_chat/internvl/train/internvl_chat_pretrain.py
Lines: L842-1116

Signature

# Stage 1: MLP warmup
python internvl_chat_pretrain.py \
    --vision_path ./pretrained/InternViT-300M \
    --llm_path ./pretrained/internlm2_5-7b-chat \
    --freeze_backbone True --freeze_llm True --freeze_mlp False \
    --drop_path_rate 0.0 --learning_rate 2e-4 --max_steps 100000

# Stage 1.5: ViT incremental
python internvl_chat_pretrain.py \
    --model_name_or_path ./output/stage1 \
    --freeze_backbone False --freeze_llm True --freeze_mlp False \
    --drop_path_rate 0.1 --learning_rate 1e-5 --max_steps 100000

# Stage 2: Full instruction tuning
python internvl_chat_pretrain.py \
    --model_name_or_path ./output/stage1_5 \
    --freeze_backbone False --freeze_llm False --freeze_mlp False \
    --drop_path_rate 0.1 --learning_rate 4e-5 --max_steps 5500

Import

cd internvl_chat
# Launched via deepspeed or torchrun with stage-specific shell scripts
bash shell/internvl2.5/stage1/internvl2_5_8b_internlm2_5_7b_dynamic_res_stage1.sh

I/O Contract

Inputs

Name	Type	Required	Description
--vision_path	str	Stage 1 only	Path to pretrained InternViT
--llm_path	str	Stage 1 only	Path to pretrained LLM
--model_name_or_path	str	Stages 1.5, 2	Previous stage output checkpoint
--freeze_backbone/llm/mlp	bool	Yes	Per-stage freeze configuration

Outputs

Name	Type	Description
Checkpoint	Directory	Trained model checkpoint for the next stage or deployment

Usage Examples

Complete 3-Stage Pipeline

cd internvl_chat

# Stage 1: MLP warmup (512 GPUs, ~100K steps)
bash shell/internvl2.5/stage1/internvl2_5_8b_internlm2_5_7b_dynamic_res_stage1.sh

# Stage 1.5: ViT incremental (512 GPUs, ~100K steps)
bash shell/internvl2.5/stage1.5/internvl2_5_8b_internlm2_5_7b_dynamic_res_stage1_5.sh

# Stage 2: Instruction tuning (512 GPUs, ~5.5K steps)
bash shell/internvl2.5/stage2/internvl2_5_8b_internlm2_5_7b_dynamic_res_stage2.sh

Related Pages

Implements Principle

Principle:OpenGVLab_InternVL_Multi_Stage_Pretraining_Pipeline

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment