Principle:OpenGVLab InternVL Training Configuration

Knowledge Sources	InternVL HuggingFace Transformers
Domains	Training, Configuration, Distributed_Computing
Last Updated	2026-02-07 00:00 GMT

Overview

A structured configuration system that controls model architecture choices, data processing parameters, and training hyperparameters through dataclass-based argument parsing.

Description

Training configuration in InternVL uses HuggingFace's HfArgumentParser to parse command-line arguments into typed dataclasses. The system separates concerns into three argument groups:

ModelArguments: Controls model architecture (freeze flags, LoRA ranks, checkpoint paths, stochastic depth)
DataTrainingArguments: Controls data processing (dataset paths, image resolution, sequence length, packed training settings)
TrainingArguments: Standard HuggingFace training hyperparameters (learning rate, batch size, scheduler, DeepSpeed config)

This separation allows shell scripts to define complete training recipes by specifying arguments for each group, making experiments reproducible and configurable.

Usage

Use this configuration system when launching InternVL training. The arguments are typically specified in shell scripts that launch distributed training via torchrun or deepspeed.

Theoretical Basis

The configuration follows a layered defaults pattern:

# Pseudo-code: Configuration hierarchy
@dataclass
class ModelArguments:
    # Architecture choices with sensible defaults
    freeze_llm: bool = False      # Unfrozen by default for full finetune
    freeze_backbone: bool = False  # Unfrozen by default
    use_llm_lora: int = 0         # LoRA disabled by default (0 = off)

# Shell script overrides defaults:
# --freeze_llm True --use_llm_lora 16  → LoRA finetune with frozen base LLM
# --freeze_llm False --use_llm_lora 0  → Full parameter finetune

Key design decisions:

Freeze flags control which model components update during training
LoRA rank (use_llm_lora, use_backbone_lora) is specified as an integer where 0 means disabled
LoRA alpha follows the convention alpha = 2 * rank
Packed training settings control the greedy bin-packing algorithm for efficient GPU utilization

Related Pages

Implemented By

Implementation:OpenGVLab_InternVL_ModelArguments_DataTrainingArguments

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment