Implementation:OpenGVLab InternVL Trainer Train
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Training, Distributed_Computing |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
HuggingFace Trainer used for supervised fine-tuning of InternVL models, integrated with DeepSpeed and custom data collators.
Description
InternVL uses the standard HuggingFace Trainer class for supervised fine-tuning workflows. The Trainer is configured with:
- An InternVLChatModel instance (with freeze configuration already applied)
- TrainingArguments controlling hyperparameters and DeepSpeed integration
- A custom data collator that handles multimodal batching (padding pixel_values and concatenating image_flags)
- The training dataset (ConcatDataset or PackedDataset)
This is a Wrapper Doc — the Trainer class comes from HuggingFace Transformers, but is configured specifically for InternVL's multimodal training.
Usage
The Trainer is instantiated and invoked in the training entry point scripts. Users configure it indirectly through shell script arguments that control TrainingArguments.
Code Reference
Source Location
- Repository: InternVL
- File: internvl_chat/internvl/train/internvl_chat_finetune.py
- Lines: L1041-1057
Signature
# Trainer instantiation in InternVL finetune script
trainer = Trainer(
model=model, # InternVLChatModel with freeze config applied
args=training_args, # HuggingFace TrainingArguments
train_dataset=train_dataset, # ConcatDataset or PackedDataset
eval_dataset=None,
tokenizer=tokenizer,
data_collator=collator, # concat_pad_data_collator or packed_collate_fn
)
# Launch training with optional checkpoint resume
train_result = trainer.train(resume_from_checkpoint=checkpoint)
trainer.save_model()
Import
from transformers import Trainer, TrainingArguments
External Reference
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | InternVLChatModel | Yes | Model with freeze/LoRA configuration applied |
| args | TrainingArguments | Yes | Training hyperparameters and DeepSpeed config |
| train_dataset | Dataset | Yes | ConcatDataset or PackedDataset of training samples |
| tokenizer | PreTrainedTokenizer | Yes | Tokenizer for padding operations |
| data_collator | Callable | Yes | concat_pad_data_collator or packed_collate_fn |
Outputs
| Name | Type | Description |
|---|---|---|
| train_result | TrainOutput | Training metrics (loss, runtime, samples_per_second) |
| checkpoints | Files | Model checkpoints saved to output_dir |
| logs | Dict | Training logs to TensorBoard/WandB |
Usage Examples
Standard Fine-tuning
from transformers import Trainer, TrainingArguments
from internvl.patch.pad_data_collator import concat_pad_data_collator
training_args = TrainingArguments(
output_dir='./output/finetune',
num_train_epochs=1,
per_device_train_batch_size=2,
gradient_accumulation_steps=8,
learning_rate=4e-5,
weight_decay=0.05,
warmup_ratio=0.03,
bf16=True,
deepspeed='zero_stage1_config.json',
save_strategy='steps',
save_steps=500,
logging_steps=1,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
tokenizer=tokenizer,
data_collator=concat_pad_data_collator,
)
trainer.train()
trainer.save_model()
Related Pages
Implements Principle
Requires Environment
- Environment:OpenGVLab_InternVL_PyTorch_CUDA
- Environment:OpenGVLab_InternVL_DeepSpeed
- Environment:OpenGVLab_InternVL_Flash_Attention_2
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment