Implementation:OpenGVLab InternVL ModelArguments DataTrainingArguments
| Knowledge Sources | |
|---|---|
| Domains | Training, Configuration |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for parsing and validating training configuration through typed dataclasses provided by the InternVL training framework.
Description
The ModelArguments and DataTrainingArguments dataclasses define the full configuration surface for InternVL training. They are parsed from command-line arguments using HuggingFace's HfArgumentParser and control model architecture, data loading, and training behavior.
Usage
These dataclasses are instantiated automatically by the training entry points (internvl_chat_finetune.py, internvl_chat_pretrain.py, internvl_chat_mpo.py). Configure them via shell script arguments.
Code Reference
Source Location
- Repository: InternVL
- File: internvl_chat/internvl/train/internvl_chat_finetune.py
- Lines: L87-266
Signature
@dataclass
class ModelArguments:
model_name_or_path: Optional[str] = None
vision_path: Optional[str] = None
llm_path: Optional[str] = None
mlp_path: Optional[str] = None
freeze_llm: bool = False
freeze_backbone: bool = False
freeze_mlp: bool = False
unfreeze_vit_layers: int = 0
vision_select_layer: int = -1
use_backbone_lora: int = 0
use_llm_lora: int = 0
unfreeze_lm_head: bool = False
grad_checkpoint: bool = True
drop_path_rate: float = 0.0
ps_version: Literal['v1', 'v2'] = 'v2'
use_fast_tokenizer: bool = False
use_liger: bool = False
@dataclass
class DataTrainingArguments:
max_seq_length: int = 8192
force_image_size: int = 448
down_sample_ratio: float = 0.5
pad2square: bool = False
conv_style: str = 'internlm2-chat'
meta_path: str = None
use_data_resampling: bool = False
dynamic_image_size: bool = False
use_thumbnail: bool = False
min_dynamic_patch: int = 1
max_dynamic_patch: int = 12
min_num_frame: int = 8
max_num_frame: int = 32
normalize_type: Literal['imagenet', 'clip', 'siglip'] = 'imagenet'
use_packed_ds: bool = False
num_images_expected: int = 40
max_packed_tokens: int = 8192
max_buffer_size: int = 20
log_freq: int = 1000
strict_mode: bool = True
replacement: bool = False
allow_overflow: bool = False
loss_reduction: str = 'token'
loss_reduction_all_gather: bool = False
Import
from transformers import HfArgumentParser, TrainingArguments
# Parsed in training entry point:
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| Command-line args | str[] | Yes | Shell script arguments matching dataclass field names |
| --model_name_or_path | str | Yes | Path to pretrained model or HuggingFace model ID |
| --meta_path | str | Yes | Path to dataset mixture JSON meta-file |
| --conv_style | str | No | Conversation template name (default 'internlm2-chat') |
Outputs
| Name | Type | Description |
|---|---|---|
| model_args | ModelArguments | Parsed model architecture configuration |
| data_args | DataTrainingArguments | Parsed data processing configuration |
| training_args | TrainingArguments | Parsed HuggingFace training hyperparameters |
Usage Examples
Full Finetune Configuration (Shell Script)
torchrun --nproc_per_node=8 internvl_chat_finetune.py \
--model_name_or_path "OpenGVLab/InternVL2_5-8B" \
--conv_style "internvl2_5" \
--meta_path "shell/data/custom_finetune.json" \
--freeze_llm False \
--freeze_backbone False \
--freeze_mlp False \
--dynamic_image_size True \
--use_thumbnail True \
--max_dynamic_patch 12 \
--max_seq_length 8192 \
--learning_rate 4e-5 \
--weight_decay 0.05 \
--warmup_ratio 0.03 \
--bf16 True \
--deepspeed zero_stage1_config.json \
--output_dir ./output/finetune
LoRA Finetune Configuration
torchrun --nproc_per_node=8 internvl_chat_finetune.py \
--model_name_or_path "OpenGVLab/InternVL2_5-8B" \
--use_llm_lora 16 \
--freeze_llm True \
--freeze_backbone True \
--freeze_mlp True \
--learning_rate 4e-5 \
--deepspeed zero_stage1_config.json \
--output_dir ./output/lora