Principle:OpenGVLab InternVL Supervised Training Loop
| Knowledge Sources | |
|---|---|
| Domains | Training, Deep_Learning, Distributed_Computing |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A managed training loop that handles gradient computation, optimization, distributed training, checkpointing, and logging for supervised fine-tuning of vision-language models.
Description
The supervised training loop abstracts the boilerplate of training large models in distributed settings. Rather than writing a custom PyTorch training loop, the framework delegates to HuggingFace's Trainer class which provides:
- Gradient accumulation: Simulates larger batch sizes across multiple forward passes
- Distributed training: Integration with DeepSpeed ZeRO for memory-efficient multi-GPU training
- Mixed precision: BF16/FP16 training for reduced memory and faster computation
- Checkpointing: Periodic model saving with configurable strategies
- Logging: Training metrics tracked via TensorBoard or Weights & Biases
- Resume from checkpoint: Seamless training continuation after interruptions
The training loop operates on data produced by the data collator, which batches and pads variable-length multimodal sequences.
Usage
Use this principle when performing supervised fine-tuning (full parameter or LoRA) on InternVL models. The Trainer handles all aspects of the training loop; the user only needs to configure the model, dataset, and training arguments.
Theoretical Basis
The supervised training objective minimizes cross-entropy loss on the assistant's response tokens:
Human turn tokens and image tokens are masked (label = -100) and excluded from loss computation.
The training loop with DeepSpeed ZeRO:
# Pseudo-code: Managed training loop
for batch in dataloader:
# Forward pass with mixed precision
with autocast(bf16=True):
loss = model(input_ids, labels, pixel_values, image_flags).loss
# Backward pass with gradient accumulation
loss = loss / gradient_accumulation_steps
loss.backward()
if step % gradient_accumulation_steps == 0:
optimizer.step()
scheduler.step()
optimizer.zero_grad()
# Periodic checkpointing
if step % save_steps == 0:
save_checkpoint()