Implementation:OpenGVLab InternVL Classification Utils
| Knowledge Sources | |
|---|---|
| Domains | Checkpoint Management, Mixed Precision Training, Model Loading |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Training utility module providing checkpoint save/load/resume logic, pretrained weight adaptation with resolution and label-set mapping, mixed-precision gradient scaling, distributed tensor reduction, and custom metrics tracking.
Description
This module provides the backbone utility functions that the classification training script depends on. Key components include:
Checkpoint management: load_checkpoint, load_ema_checkpoint, and save_checkpoint handle full training state serialization including model weights, optimizer state, LR scheduler, AMP scaler, EMA model, epoch counter, and maximum accuracy. Old checkpoints are automatically cleaned up based on configurable retention count.
Pretrained weight adaptation: load_pretrained performs extensive weight adaptation logic: stripping student/teacher prefixes from distillation checkpoints, remapping SIM-format keys, bicubic interpolation of relative position bias tables and absolute position embeddings when resolutions differ between pretrained and target model, and ImageNet-22K to 1K classifier head mapping.
Mixed-precision support: NativeScalerWithGradNormCount wraps torch.cuda.amp.GradScaler with gradient clipping and norm computation integrated into the scale-unscale-step cycle.
Distributed utilities: reduce_tensor performs all-reduce averaging across GPUs, and auto_resume_helper finds the latest checkpoint by file modification time.
Metrics: MyAverageMeter tracks running mean and standard deviation over a configurable sliding window, used for gradient norm monitoring during training.
Usage
Use the functions in this module from the main classification training script. Import save_checkpoint and load_checkpoint for training state persistence, load_pretrained for fine-tuning from various upstream weight sources, and NativeScalerWithGradNormCount for mixed-precision training with gradient norm tracking.
Code Reference
Source Location
- Repository: OpenGVLab_InternVL
- File: classification/utils.py
- Lines: 1-408
Signature
def load_checkpoint(config, model, optimizer, lr_scheduler, scaler, logger):
...
def load_ema_checkpoint(config, model_ema, logger):
...
def load_pretrained(config, model, logger):
...
def save_checkpoint(config, epoch, model, max_accuracy, optimizer,
lr_scheduler, scaler, logger, model_ema=None,
max_accuracy_ema=None, ema_decay=None, best=None):
...
class NativeScalerWithGradNormCount:
def __call__(self, loss, optimizer, clip_grad=None, parameters=None,
create_graph=False, update_grad=True):
...
def auto_resume_helper(output_dir):
...
def reduce_tensor(tensor):
...
def get_grad_norm(parameters, norm_type=2):
...
class MyAverageMeter:
def __init__(self, max_len=-1):
...
def update(self, val):
...
Import
from utils import MyAverageMeter
from utils import NativeScalerWithGradNormCount as NativeScaler
from utils import (auto_resume_helper, get_grad_norm, load_checkpoint,
load_ema_checkpoint, load_pretrained, reduce_tensor,
save_checkpoint)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | CfgNode | Yes | Configuration object with model/training parameters |
| model | nn.Module | Yes | The model whose state is being saved/loaded |
| optimizer | Optimizer | Yes | Optimizer whose state is serialized with checkpoints |
| lr_scheduler | LRScheduler | Yes | Learning rate scheduler state |
| scaler | GradScaler | No | AMP loss scaler state (when AMP is enabled) |
| logger | Logger | Yes | Logger for status messages |
Outputs
| Name | Type | Description |
|---|---|---|
| max_accuracy | float | Best validation accuracy loaded from checkpoint (from load_checkpoint) |
| checkpoint file | .pth | Serialized training state dictionary (from save_checkpoint) |
| resume_file | str or None | Path to latest checkpoint file (from auto_resume_helper) |
| grad_norm | float | L2 norm of gradients (from NativeScalerWithGradNormCount or get_grad_norm) |
Usage Examples
Basic Usage
from utils import save_checkpoint, load_checkpoint, NativeScalerWithGradNormCount
# Initialize loss scaler for mixed-precision training
loss_scaler = NativeScalerWithGradNormCount()
# Resume from checkpoint
max_accuracy = load_checkpoint(config, model, optimizer, lr_scheduler, loss_scaler, logger)
# Save checkpoint after training epoch
save_checkpoint(config, epoch, model, max_accuracy, optimizer, lr_scheduler,
loss_scaler, logger, model_ema=model_ema, best='best')