Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:OpenGVLab InternVL Training Configuration

From Leeroopedia


Knowledge Sources
Domains Training, Configuration, Distributed_Computing
Last Updated 2026-02-07 00:00 GMT

Overview

A structured configuration system that controls model architecture choices, data processing parameters, and training hyperparameters through dataclass-based argument parsing.

Description

Training configuration in InternVL uses HuggingFace's HfArgumentParser to parse command-line arguments into typed dataclasses. The system separates concerns into three argument groups:

  • ModelArguments: Controls model architecture (freeze flags, LoRA ranks, checkpoint paths, stochastic depth)
  • DataTrainingArguments: Controls data processing (dataset paths, image resolution, sequence length, packed training settings)
  • TrainingArguments: Standard HuggingFace training hyperparameters (learning rate, batch size, scheduler, DeepSpeed config)

This separation allows shell scripts to define complete training recipes by specifying arguments for each group, making experiments reproducible and configurable.

Usage

Use this configuration system when launching InternVL training. The arguments are typically specified in shell scripts that launch distributed training via torchrun or deepspeed.

Theoretical Basis

The configuration follows a layered defaults pattern:

# Pseudo-code: Configuration hierarchy
@dataclass
class ModelArguments:
    # Architecture choices with sensible defaults
    freeze_llm: bool = False      # Unfrozen by default for full finetune
    freeze_backbone: bool = False  # Unfrozen by default
    use_llm_lora: int = 0         # LoRA disabled by default (0 = off)

# Shell script overrides defaults:
# --freeze_llm True --use_llm_lora 16  → LoRA finetune with frozen base LLM
# --freeze_llm False --use_llm_lora 0  → Full parameter finetune

Key design decisions:

  • Freeze flags control which model components update during training
  • LoRA rank (use_llm_lora, use_backbone_lora) is specified as an integer where 0 means disabled
  • LoRA alpha follows the convention alpha = 2 * rank
  • Packed training settings control the greedy bin-packing algorithm for efficient GPU utilization

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment