Implementation:OpenGVLab InternVL Pretrain Main
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Pretraining, Training |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for executing multi-stage pretraining of InternVL models provided by the InternVL training framework.
Description
The internvl_chat_pretrain.py script implements all three pretraining stages. It differs from the finetune script by:
- Supporting Path B model loading (separate vision + LLM components)
- Implementing rank-aware data distribution optimized for large-scale training
- Always using packed training mode for efficient GPU utilization
- Supporting stage-specific freeze configurations via command-line arguments
The same script is invoked for all three stages with different configuration arguments.
Usage
Launched three times sequentially with different shell scripts for Stage 1, 1.5, and 2. Each stage loads the previous stage's output checkpoint.
Code Reference
Source Location
- Repository: InternVL
- File: internvl_chat/internvl/train/internvl_chat_pretrain.py
- Lines: L842-1116
Signature
# Stage 1: MLP warmup
python internvl_chat_pretrain.py \
--vision_path ./pretrained/InternViT-300M \
--llm_path ./pretrained/internlm2_5-7b-chat \
--freeze_backbone True --freeze_llm True --freeze_mlp False \
--drop_path_rate 0.0 --learning_rate 2e-4 --max_steps 100000
# Stage 1.5: ViT incremental
python internvl_chat_pretrain.py \
--model_name_or_path ./output/stage1 \
--freeze_backbone False --freeze_llm True --freeze_mlp False \
--drop_path_rate 0.1 --learning_rate 1e-5 --max_steps 100000
# Stage 2: Full instruction tuning
python internvl_chat_pretrain.py \
--model_name_or_path ./output/stage1_5 \
--freeze_backbone False --freeze_llm False --freeze_mlp False \
--drop_path_rate 0.1 --learning_rate 4e-5 --max_steps 5500
Import
cd internvl_chat
# Launched via deepspeed or torchrun with stage-specific shell scripts
bash shell/internvl2.5/stage1/internvl2_5_8b_internlm2_5_7b_dynamic_res_stage1.sh
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --vision_path | str | Stage 1 only | Path to pretrained InternViT |
| --llm_path | str | Stage 1 only | Path to pretrained LLM |
| --model_name_or_path | str | Stages 1.5, 2 | Previous stage output checkpoint |
| --freeze_backbone/llm/mlp | bool | Yes | Per-stage freeze configuration |
Outputs
| Name | Type | Description |
|---|---|---|
| Checkpoint | Directory | Trained model checkpoint for the next stage or deployment |
Usage Examples
Complete 3-Stage Pipeline
cd internvl_chat
# Stage 1: MLP warmup (512 GPUs, ~100K steps)
bash shell/internvl2.5/stage1/internvl2_5_8b_internlm2_5_7b_dynamic_res_stage1.sh
# Stage 1.5: ViT incremental (512 GPUs, ~100K steps)
bash shell/internvl2.5/stage1.5/internvl2_5_8b_internlm2_5_7b_dynamic_res_stage1_5.sh
# Stage 2: Instruction tuning (512 GPUs, ~5.5K steps)
bash shell/internvl2.5/stage2/internvl2_5_8b_internlm2_5_7b_dynamic_res_stage2.sh
Related Pages
Implements Principle
Requires Environment
- Environment:OpenGVLab_InternVL_PyTorch_CUDA
- Environment:OpenGVLab_InternVL_DeepSpeed
- Environment:OpenGVLab_InternVL_Flash_Attention_2
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment