Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:OpenGVLab InternVL Classification Main

From Leeroopedia


Knowledge Sources
Domains Image Classification, Distributed Training, Vision Transformer
Last Updated 2026-02-07 14:00 GMT

Overview

Main entry point for training and evaluating InternViT-6B on image classification tasks, orchestrating the full distributed training pipeline including model construction, optimization, mixed-precision training, checkpointing, and validation.

Description

This module serves as the central training and evaluation script for InternViT-6B classification. It initializes distributed training (supporting both PyTorch launcher and SLURM), linearly scales learning rates based on total batch size, and builds all components via the config system. The main function assembles the complete pipeline: data loaders, model wrapped in DistributedDataParallel, optimizer, AMP setup (native or Apex), LR scheduler, and loss criterion (SoftTargetCrossEntropy with mixup, LabelSmoothingCrossEntropy, or standard CrossEntropy).

The train_one_epoch function handles mixed-precision forward/backward passes, gradient accumulation, gradient clipping, loss scaling, and EMA (Exponential Moving Average) model updates. Validation supports standard ImageNet (validate), ImageNet-ReaL relabeled evaluation (validate_real), and ImageNet-A/R subset masking via index-based output filtering. The script includes ImageNet-22K to 1K class mapping for models pretrained on the larger label set.

Checkpoints are saved periodically and for best accuracy, with automatic old checkpoint cleanup managed by the utility module. The script also supports throughput benchmarking mode for measuring inference speed.

Usage

Use this module as the primary entry point for training or evaluating InternViT-6B classification models. Invoke via distributed launcher (e.g., torchrun or SLURM) with a YAML config file specifying model architecture, data paths, and training hyperparameters. Set --eval for evaluation-only mode or --throughput for performance benchmarking.

Code Reference

Source Location

Signature

def parse_option():
    ...

def main(config):
    ...

def train_one_epoch(config, model, criterion, data_loader, optimizer, epoch,
                    mixup_fn, lr_scheduler, amp_autocast=suppress,
                    loss_scaler=None, model_ema=None):
    ...

@torch.no_grad()
def validate(config, data_loader, model, epoch=None, amp_autocast=suppress):
    ...

@torch.no_grad()
def validate_real(config, data_loader, model, real_labels, amp_autocast=suppress):
    ...

@torch.no_grad()
def throughput(data_loader, model, logger):
    ...

Import

# This is a standalone script, not typically imported
# Run via: torchrun --nproc_per_node=8 classification/main.py --cfg <config.yaml> --local-rank 0

I/O Contract

Inputs

Name Type Required Description
--cfg str Yes Path to YAML config file defining model, data, and training parameters
--local-rank int Yes Local rank for DistributedDataParallel
--batch-size int No Override batch size for single GPU
--eval flag No Perform evaluation only
--resume str No Path to checkpoint for resuming training
--pretrained str No Path to pretrained weights for fine-tuning
--throughput flag No Run throughput benchmark only
--launcher str No Distributed launcher type: pytorch or slurm

Outputs

Name Type Description
checkpoints .pth files Saved model checkpoints at config.OUTPUT directory
config.json JSON file Saved full training configuration
logs stdout Training/validation metrics logged per epoch and per print frequency

Usage Examples

Basic Usage

# Training with PyTorch distributed launcher
# torchrun --nproc_per_node=8 classification/main.py \
#     --cfg configs/intern_vit_6b_224px.yaml \
#     --local-rank 0 \
#     --batch-size 128 \
#     --data-path /path/to/imagenet

# Evaluation only
# torchrun --nproc_per_node=1 classification/main.py \
#     --cfg configs/intern_vit_6b_224px.yaml \
#     --local-rank 0 \
#     --eval \
#     --resume /path/to/checkpoint.pth

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment