Implementation:OpenGVLab InternVL Trainer Train

Knowledge Sources	InternVL HuggingFace Trainer
Domains	Training, Distributed_Computing
Last Updated	2026-02-07 00:00 GMT

Overview

HuggingFace Trainer used for supervised fine-tuning of InternVL models, integrated with DeepSpeed and custom data collators.

Description

InternVL uses the standard HuggingFace Trainer class for supervised fine-tuning workflows. The Trainer is configured with:

An InternVLChatModel instance (with freeze configuration already applied)
TrainingArguments controlling hyperparameters and DeepSpeed integration
A custom data collator that handles multimodal batching (padding pixel_values and concatenating image_flags)
The training dataset (ConcatDataset or PackedDataset)

This is a Wrapper Doc — the Trainer class comes from HuggingFace Transformers, but is configured specifically for InternVL's multimodal training.

Usage

The Trainer is instantiated and invoked in the training entry point scripts. Users configure it indirectly through shell script arguments that control TrainingArguments.

Code Reference

Source Location

Repository: InternVL
File: internvl_chat/internvl/train/internvl_chat_finetune.py
Lines: L1041-1057

Signature

# Trainer instantiation in InternVL finetune script
trainer = Trainer(
    model=model,                    # InternVLChatModel with freeze config applied
    args=training_args,             # HuggingFace TrainingArguments
    train_dataset=train_dataset,    # ConcatDataset or PackedDataset
    eval_dataset=None,
    tokenizer=tokenizer,
    data_collator=collator,         # concat_pad_data_collator or packed_collate_fn
)

# Launch training with optional checkpoint resume
train_result = trainer.train(resume_from_checkpoint=checkpoint)
trainer.save_model()

Import

from transformers import Trainer, TrainingArguments

External Reference

HuggingFace Trainer Documentation

I/O Contract

Inputs

Name	Type	Required	Description
model	InternVLChatModel	Yes	Model with freeze/LoRA configuration applied
args	TrainingArguments	Yes	Training hyperparameters and DeepSpeed config
train_dataset	Dataset	Yes	ConcatDataset or PackedDataset of training samples
tokenizer	PreTrainedTokenizer	Yes	Tokenizer for padding operations
data_collator	Callable	Yes	concat_pad_data_collator or packed_collate_fn

Outputs

Name	Type	Description
train_result	TrainOutput	Training metrics (loss, runtime, samples_per_second)
checkpoints	Files	Model checkpoints saved to output_dir
logs	Dict	Training logs to TensorBoard/WandB

Usage Examples

Standard Fine-tuning

from transformers import Trainer, TrainingArguments
from internvl.patch.pad_data_collator import concat_pad_data_collator

training_args = TrainingArguments(
    output_dir='./output/finetune',
    num_train_epochs=1,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=4e-5,
    weight_decay=0.05,
    warmup_ratio=0.03,
    bf16=True,
    deepspeed='zero_stage1_config.json',
    save_strategy='steps',
    save_steps=500,
    logging_steps=1,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    tokenizer=tokenizer,
    data_collator=concat_pad_data_collator,
)
trainer.train()
trainer.save_model()

Related Pages

Implements Principle

Principle:OpenGVLab_InternVL_Supervised_Training_Loop

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment