Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:OpenGVLab InternVL Trainer Train

From Leeroopedia


Knowledge Sources
Domains Training, Distributed_Computing
Last Updated 2026-02-07 00:00 GMT

Overview

HuggingFace Trainer used for supervised fine-tuning of InternVL models, integrated with DeepSpeed and custom data collators.

Description

InternVL uses the standard HuggingFace Trainer class for supervised fine-tuning workflows. The Trainer is configured with:

  • An InternVLChatModel instance (with freeze configuration already applied)
  • TrainingArguments controlling hyperparameters and DeepSpeed integration
  • A custom data collator that handles multimodal batching (padding pixel_values and concatenating image_flags)
  • The training dataset (ConcatDataset or PackedDataset)

This is a Wrapper Doc — the Trainer class comes from HuggingFace Transformers, but is configured specifically for InternVL's multimodal training.

Usage

The Trainer is instantiated and invoked in the training entry point scripts. Users configure it indirectly through shell script arguments that control TrainingArguments.

Code Reference

Source Location

  • Repository: InternVL
  • File: internvl_chat/internvl/train/internvl_chat_finetune.py
  • Lines: L1041-1057

Signature

# Trainer instantiation in InternVL finetune script
trainer = Trainer(
    model=model,                    # InternVLChatModel with freeze config applied
    args=training_args,             # HuggingFace TrainingArguments
    train_dataset=train_dataset,    # ConcatDataset or PackedDataset
    eval_dataset=None,
    tokenizer=tokenizer,
    data_collator=collator,         # concat_pad_data_collator or packed_collate_fn
)

# Launch training with optional checkpoint resume
train_result = trainer.train(resume_from_checkpoint=checkpoint)
trainer.save_model()

Import

from transformers import Trainer, TrainingArguments

External Reference

I/O Contract

Inputs

Name Type Required Description
model InternVLChatModel Yes Model with freeze/LoRA configuration applied
args TrainingArguments Yes Training hyperparameters and DeepSpeed config
train_dataset Dataset Yes ConcatDataset or PackedDataset of training samples
tokenizer PreTrainedTokenizer Yes Tokenizer for padding operations
data_collator Callable Yes concat_pad_data_collator or packed_collate_fn

Outputs

Name Type Description
train_result TrainOutput Training metrics (loss, runtime, samples_per_second)
checkpoints Files Model checkpoints saved to output_dir
logs Dict Training logs to TensorBoard/WandB

Usage Examples

Standard Fine-tuning

from transformers import Trainer, TrainingArguments
from internvl.patch.pad_data_collator import concat_pad_data_collator

training_args = TrainingArguments(
    output_dir='./output/finetune',
    num_train_epochs=1,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=4e-5,
    weight_decay=0.05,
    warmup_ratio=0.03,
    bf16=True,
    deepspeed='zero_stage1_config.json',
    save_strategy='steps',
    save_steps=500,
    logging_steps=1,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    tokenizer=tokenizer,
    data_collator=concat_pad_data_collator,
)
trainer.train()
trainer.save_model()

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment