Implementation:OpenGVLab InternVL Classification Utils

Knowledge Sources	OpenGVLab_InternVL
Domains	Checkpoint Management, Mixed Precision Training, Model Loading
Last Updated	2026-02-07 14:00 GMT

Overview

Training utility module providing checkpoint save/load/resume logic, pretrained weight adaptation with resolution and label-set mapping, mixed-precision gradient scaling, distributed tensor reduction, and custom metrics tracking.

Description

This module provides the backbone utility functions that the classification training script depends on. Key components include:

Checkpoint management: load_checkpoint, load_ema_checkpoint, and save_checkpoint handle full training state serialization including model weights, optimizer state, LR scheduler, AMP scaler, EMA model, epoch counter, and maximum accuracy. Old checkpoints are automatically cleaned up based on configurable retention count.

Pretrained weight adaptation: load_pretrained performs extensive weight adaptation logic: stripping student/teacher prefixes from distillation checkpoints, remapping SIM-format keys, bicubic interpolation of relative position bias tables and absolute position embeddings when resolutions differ between pretrained and target model, and ImageNet-22K to 1K classifier head mapping.

Mixed-precision support: NativeScalerWithGradNormCount wraps torch.cuda.amp.GradScaler with gradient clipping and norm computation integrated into the scale-unscale-step cycle.

Distributed utilities: reduce_tensor performs all-reduce averaging across GPUs, and auto_resume_helper finds the latest checkpoint by file modification time.

Metrics: MyAverageMeter tracks running mean and standard deviation over a configurable sliding window, used for gradient norm monitoring during training.

Usage

Use the functions in this module from the main classification training script. Import save_checkpoint and load_checkpoint for training state persistence, load_pretrained for fine-tuning from various upstream weight sources, and NativeScalerWithGradNormCount for mixed-precision training with gradient norm tracking.

Code Reference

Source Location

Repository: OpenGVLab_InternVL
File: classification/utils.py
Lines: 1-408

Signature

def load_checkpoint(config, model, optimizer, lr_scheduler, scaler, logger):
    ...

def load_ema_checkpoint(config, model_ema, logger):
    ...

def load_pretrained(config, model, logger):
    ...

def save_checkpoint(config, epoch, model, max_accuracy, optimizer,
                    lr_scheduler, scaler, logger, model_ema=None,
                    max_accuracy_ema=None, ema_decay=None, best=None):
    ...

class NativeScalerWithGradNormCount:
    def __call__(self, loss, optimizer, clip_grad=None, parameters=None,
                 create_graph=False, update_grad=True):
        ...

def auto_resume_helper(output_dir):
    ...

def reduce_tensor(tensor):
    ...

def get_grad_norm(parameters, norm_type=2):
    ...

class MyAverageMeter:
    def __init__(self, max_len=-1):
        ...
    def update(self, val):
        ...

Import

from utils import MyAverageMeter
from utils import NativeScalerWithGradNormCount as NativeScaler
from utils import (auto_resume_helper, get_grad_norm, load_checkpoint,
                   load_ema_checkpoint, load_pretrained, reduce_tensor,
                   save_checkpoint)

I/O Contract

Inputs

Name	Type	Required	Description
config	CfgNode	Yes	Configuration object with model/training parameters
model	nn.Module	Yes	The model whose state is being saved/loaded
optimizer	Optimizer	Yes	Optimizer whose state is serialized with checkpoints
lr_scheduler	LRScheduler	Yes	Learning rate scheduler state
scaler	GradScaler	No	AMP loss scaler state (when AMP is enabled)
logger	Logger	Yes	Logger for status messages

Outputs

Name	Type	Description
max_accuracy	float	Best validation accuracy loaded from checkpoint (from load_checkpoint)
checkpoint file	.pth	Serialized training state dictionary (from save_checkpoint)
resume_file	str or None	Path to latest checkpoint file (from auto_resume_helper)
grad_norm	float	L2 norm of gradients (from NativeScalerWithGradNormCount or get_grad_norm)

Usage Examples

Basic Usage

from utils import save_checkpoint, load_checkpoint, NativeScalerWithGradNormCount

# Initialize loss scaler for mixed-precision training
loss_scaler = NativeScalerWithGradNormCount()

# Resume from checkpoint
max_accuracy = load_checkpoint(config, model, optimizer, lr_scheduler, loss_scaler, logger)

# Save checkpoint after training epoch
save_checkpoint(config, epoch, model, max_accuracy, optimizer, lr_scheduler,
                loss_scaler, logger, model_ema=model_ema, best='best')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment