Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:OpenGVLab InternVL ModelArguments DataTrainingArguments

From Leeroopedia


Knowledge Sources
Domains Training, Configuration
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for parsing and validating training configuration through typed dataclasses provided by the InternVL training framework.

Description

The ModelArguments and DataTrainingArguments dataclasses define the full configuration surface for InternVL training. They are parsed from command-line arguments using HuggingFace's HfArgumentParser and control model architecture, data loading, and training behavior.

Usage

These dataclasses are instantiated automatically by the training entry points (internvl_chat_finetune.py, internvl_chat_pretrain.py, internvl_chat_mpo.py). Configure them via shell script arguments.

Code Reference

Source Location

  • Repository: InternVL
  • File: internvl_chat/internvl/train/internvl_chat_finetune.py
  • Lines: L87-266

Signature

@dataclass
class ModelArguments:
    model_name_or_path: Optional[str] = None
    vision_path: Optional[str] = None
    llm_path: Optional[str] = None
    mlp_path: Optional[str] = None
    freeze_llm: bool = False
    freeze_backbone: bool = False
    freeze_mlp: bool = False
    unfreeze_vit_layers: int = 0
    vision_select_layer: int = -1
    use_backbone_lora: int = 0
    use_llm_lora: int = 0
    unfreeze_lm_head: bool = False
    grad_checkpoint: bool = True
    drop_path_rate: float = 0.0
    ps_version: Literal['v1', 'v2'] = 'v2'
    use_fast_tokenizer: bool = False
    use_liger: bool = False

@dataclass
class DataTrainingArguments:
    max_seq_length: int = 8192
    force_image_size: int = 448
    down_sample_ratio: float = 0.5
    pad2square: bool = False
    conv_style: str = 'internlm2-chat'
    meta_path: str = None
    use_data_resampling: bool = False
    dynamic_image_size: bool = False
    use_thumbnail: bool = False
    min_dynamic_patch: int = 1
    max_dynamic_patch: int = 12
    min_num_frame: int = 8
    max_num_frame: int = 32
    normalize_type: Literal['imagenet', 'clip', 'siglip'] = 'imagenet'
    use_packed_ds: bool = False
    num_images_expected: int = 40
    max_packed_tokens: int = 8192
    max_buffer_size: int = 20
    log_freq: int = 1000
    strict_mode: bool = True
    replacement: bool = False
    allow_overflow: bool = False
    loss_reduction: str = 'token'
    loss_reduction_all_gather: bool = False

Import

from transformers import HfArgumentParser, TrainingArguments

# Parsed in training entry point:
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
model_args, data_args, training_args = parser.parse_args_into_dataclasses()

I/O Contract

Inputs

Name Type Required Description
Command-line args str[] Yes Shell script arguments matching dataclass field names
--model_name_or_path str Yes Path to pretrained model or HuggingFace model ID
--meta_path str Yes Path to dataset mixture JSON meta-file
--conv_style str No Conversation template name (default 'internlm2-chat')

Outputs

Name Type Description
model_args ModelArguments Parsed model architecture configuration
data_args DataTrainingArguments Parsed data processing configuration
training_args TrainingArguments Parsed HuggingFace training hyperparameters

Usage Examples

Full Finetune Configuration (Shell Script)

torchrun --nproc_per_node=8 internvl_chat_finetune.py \
    --model_name_or_path "OpenGVLab/InternVL2_5-8B" \
    --conv_style "internvl2_5" \
    --meta_path "shell/data/custom_finetune.json" \
    --freeze_llm False \
    --freeze_backbone False \
    --freeze_mlp False \
    --dynamic_image_size True \
    --use_thumbnail True \
    --max_dynamic_patch 12 \
    --max_seq_length 8192 \
    --learning_rate 4e-5 \
    --weight_decay 0.05 \
    --warmup_ratio 0.03 \
    --bf16 True \
    --deepspeed zero_stage1_config.json \
    --output_dir ./output/finetune

LoRA Finetune Configuration

torchrun --nproc_per_node=8 internvl_chat_finetune.py \
    --model_name_or_path "OpenGVLab/InternVL2_5-8B" \
    --use_llm_lora 16 \
    --freeze_llm True \
    --freeze_backbone True \
    --freeze_mlp True \
    --learning_rate 4e-5 \
    --deepspeed zero_stage1_config.json \
    --output_dir ./output/lora

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment