Implementation:OpenGVLab InternVL ModelArguments DataTrainingArguments

Knowledge Sources	InternVL HuggingFace Transformers
Domains	Training, Configuration
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for parsing and validating training configuration through typed dataclasses provided by the InternVL training framework.

Description

The ModelArguments and DataTrainingArguments dataclasses define the full configuration surface for InternVL training. They are parsed from command-line arguments using HuggingFace's HfArgumentParser and control model architecture, data loading, and training behavior.

Usage

These dataclasses are instantiated automatically by the training entry points (internvl_chat_finetune.py, internvl_chat_pretrain.py, internvl_chat_mpo.py). Configure them via shell script arguments.

Code Reference

Source Location

Repository: InternVL
File: internvl_chat/internvl/train/internvl_chat_finetune.py
Lines: L87-266

Signature

@dataclass
class ModelArguments:
    model_name_or_path: Optional[str] = None
    vision_path: Optional[str] = None
    llm_path: Optional[str] = None
    mlp_path: Optional[str] = None
    freeze_llm: bool = False
    freeze_backbone: bool = False
    freeze_mlp: bool = False
    unfreeze_vit_layers: int = 0
    vision_select_layer: int = -1
    use_backbone_lora: int = 0
    use_llm_lora: int = 0
    unfreeze_lm_head: bool = False
    grad_checkpoint: bool = True
    drop_path_rate: float = 0.0
    ps_version: Literal['v1', 'v2'] = 'v2'
    use_fast_tokenizer: bool = False
    use_liger: bool = False

@dataclass
class DataTrainingArguments:
    max_seq_length: int = 8192
    force_image_size: int = 448
    down_sample_ratio: float = 0.5
    pad2square: bool = False
    conv_style: str = 'internlm2-chat'
    meta_path: str = None
    use_data_resampling: bool = False
    dynamic_image_size: bool = False
    use_thumbnail: bool = False
    min_dynamic_patch: int = 1
    max_dynamic_patch: int = 12
    min_num_frame: int = 8
    max_num_frame: int = 32
    normalize_type: Literal['imagenet', 'clip', 'siglip'] = 'imagenet'
    use_packed_ds: bool = False
    num_images_expected: int = 40
    max_packed_tokens: int = 8192
    max_buffer_size: int = 20
    log_freq: int = 1000
    strict_mode: bool = True
    replacement: bool = False
    allow_overflow: bool = False
    loss_reduction: str = 'token'
    loss_reduction_all_gather: bool = False

Import

from transformers import HfArgumentParser, TrainingArguments

# Parsed in training entry point:
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
model_args, data_args, training_args = parser.parse_args_into_dataclasses()

I/O Contract

Inputs

Name	Type	Required	Description
Command-line args	str[]	Yes	Shell script arguments matching dataclass field names
--model_name_or_path	str	Yes	Path to pretrained model or HuggingFace model ID
--meta_path	str	Yes	Path to dataset mixture JSON meta-file
--conv_style	str	No	Conversation template name (default 'internlm2-chat')

Outputs

Name	Type	Description
model_args	ModelArguments	Parsed model architecture configuration
data_args	DataTrainingArguments	Parsed data processing configuration
training_args	TrainingArguments	Parsed HuggingFace training hyperparameters

Usage Examples

Full Finetune Configuration (Shell Script)

torchrun --nproc_per_node=8 internvl_chat_finetune.py \
    --model_name_or_path "OpenGVLab/InternVL2_5-8B" \
    --conv_style "internvl2_5" \
    --meta_path "shell/data/custom_finetune.json" \
    --freeze_llm False \
    --freeze_backbone False \
    --freeze_mlp False \
    --dynamic_image_size True \
    --use_thumbnail True \
    --max_dynamic_patch 12 \
    --max_seq_length 8192 \
    --learning_rate 4e-5 \
    --weight_decay 0.05 \
    --warmup_ratio 0.03 \
    --bf16 True \
    --deepspeed zero_stage1_config.json \
    --output_dir ./output/finetune

LoRA Finetune Configuration

torchrun --nproc_per_node=8 internvl_chat_finetune.py \
    --model_name_or_path "OpenGVLab/InternVL2_5-8B" \
    --use_llm_lora 16 \
    --freeze_llm True \
    --freeze_backbone True \
    --freeze_mlp True \
    --learning_rate 4e-5 \
    --deepspeed zero_stage1_config.json \
    --output_dir ./output/lora

Related Pages

Implements Principle

Principle:OpenGVLab_InternVL_Training_Configuration

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment