Principle:OpenGVLab InternVL Training Configuration
| Knowledge Sources | |
|---|---|
| Domains | Training, Configuration, Distributed_Computing |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A structured configuration system that controls model architecture choices, data processing parameters, and training hyperparameters through dataclass-based argument parsing.
Description
Training configuration in InternVL uses HuggingFace's HfArgumentParser to parse command-line arguments into typed dataclasses. The system separates concerns into three argument groups:
- ModelArguments: Controls model architecture (freeze flags, LoRA ranks, checkpoint paths, stochastic depth)
- DataTrainingArguments: Controls data processing (dataset paths, image resolution, sequence length, packed training settings)
- TrainingArguments: Standard HuggingFace training hyperparameters (learning rate, batch size, scheduler, DeepSpeed config)
This separation allows shell scripts to define complete training recipes by specifying arguments for each group, making experiments reproducible and configurable.
Usage
Use this configuration system when launching InternVL training. The arguments are typically specified in shell scripts that launch distributed training via torchrun or deepspeed.
Theoretical Basis
The configuration follows a layered defaults pattern:
# Pseudo-code: Configuration hierarchy
@dataclass
class ModelArguments:
# Architecture choices with sensible defaults
freeze_llm: bool = False # Unfrozen by default for full finetune
freeze_backbone: bool = False # Unfrozen by default
use_llm_lora: int = 0 # LoRA disabled by default (0 = off)
# Shell script overrides defaults:
# --freeze_llm True --use_llm_lora 16 → LoRA finetune with frozen base LLM
# --freeze_llm False --use_llm_lora 0 → Full parameter finetune
Key design decisions:
- Freeze flags control which model components update during training
- LoRA rank (use_llm_lora, use_backbone_lora) is specified as an integer where 0 means disabled
- LoRA alpha follows the convention alpha = 2 * rank
- Packed training settings control the greedy bin-packing algorithm for efficient GPU utilization