Implementation:OpenGVLab InternVL Classification Main
| Knowledge Sources | |
|---|---|
| Domains | Image Classification, Distributed Training, Vision Transformer |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Main entry point for training and evaluating InternViT-6B on image classification tasks, orchestrating the full distributed training pipeline including model construction, optimization, mixed-precision training, checkpointing, and validation.
Description
This module serves as the central training and evaluation script for InternViT-6B classification. It initializes distributed training (supporting both PyTorch launcher and SLURM), linearly scales learning rates based on total batch size, and builds all components via the config system. The main function assembles the complete pipeline: data loaders, model wrapped in DistributedDataParallel, optimizer, AMP setup (native or Apex), LR scheduler, and loss criterion (SoftTargetCrossEntropy with mixup, LabelSmoothingCrossEntropy, or standard CrossEntropy).
The train_one_epoch function handles mixed-precision forward/backward passes, gradient accumulation, gradient clipping, loss scaling, and EMA (Exponential Moving Average) model updates. Validation supports standard ImageNet (validate), ImageNet-ReaL relabeled evaluation (validate_real), and ImageNet-A/R subset masking via index-based output filtering. The script includes ImageNet-22K to 1K class mapping for models pretrained on the larger label set.
Checkpoints are saved periodically and for best accuracy, with automatic old checkpoint cleanup managed by the utility module. The script also supports throughput benchmarking mode for measuring inference speed.
Usage
Use this module as the primary entry point for training or evaluating InternViT-6B classification models. Invoke via distributed launcher (e.g., torchrun or SLURM) with a YAML config file specifying model architecture, data paths, and training hyperparameters. Set --eval for evaluation-only mode or --throughput for performance benchmarking.
Code Reference
Source Location
- Repository: OpenGVLab_InternVL
- File: classification/main.py
- Lines: 1-756
Signature
def parse_option():
...
def main(config):
...
def train_one_epoch(config, model, criterion, data_loader, optimizer, epoch,
mixup_fn, lr_scheduler, amp_autocast=suppress,
loss_scaler=None, model_ema=None):
...
@torch.no_grad()
def validate(config, data_loader, model, epoch=None, amp_autocast=suppress):
...
@torch.no_grad()
def validate_real(config, data_loader, model, real_labels, amp_autocast=suppress):
...
@torch.no_grad()
def throughput(data_loader, model, logger):
...
Import
# This is a standalone script, not typically imported
# Run via: torchrun --nproc_per_node=8 classification/main.py --cfg <config.yaml> --local-rank 0
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --cfg | str | Yes | Path to YAML config file defining model, data, and training parameters |
| --local-rank | int | Yes | Local rank for DistributedDataParallel |
| --batch-size | int | No | Override batch size for single GPU |
| --eval | flag | No | Perform evaluation only |
| --resume | str | No | Path to checkpoint for resuming training |
| --pretrained | str | No | Path to pretrained weights for fine-tuning |
| --throughput | flag | No | Run throughput benchmark only |
| --launcher | str | No | Distributed launcher type: pytorch or slurm |
Outputs
| Name | Type | Description |
|---|---|---|
| checkpoints | .pth files | Saved model checkpoints at config.OUTPUT directory |
| config.json | JSON file | Saved full training configuration |
| logs | stdout | Training/validation metrics logged per epoch and per print frequency |
Usage Examples
Basic Usage
# Training with PyTorch distributed launcher
# torchrun --nproc_per_node=8 classification/main.py \
# --cfg configs/intern_vit_6b_224px.yaml \
# --local-rank 0 \
# --batch-size 128 \
# --data-path /path/to/imagenet
# Evaluation only
# torchrun --nproc_per_node=1 classification/main.py \
# --cfg configs/intern_vit_6b_224px.yaml \
# --local-rank 0 \
# --eval \
# --resume /path/to/checkpoint.pth