Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:OpenGVLab InternVL Classification Utils

From Leeroopedia
Revision as of 16:14, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/OpenGVLab_InternVL_Classification_Utils.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Checkpoint Management, Mixed Precision Training, Model Loading
Last Updated 2026-02-07 14:00 GMT

Overview

Training utility module providing checkpoint save/load/resume logic, pretrained weight adaptation with resolution and label-set mapping, mixed-precision gradient scaling, distributed tensor reduction, and custom metrics tracking.

Description

This module provides the backbone utility functions that the classification training script depends on. Key components include:

Checkpoint management: load_checkpoint, load_ema_checkpoint, and save_checkpoint handle full training state serialization including model weights, optimizer state, LR scheduler, AMP scaler, EMA model, epoch counter, and maximum accuracy. Old checkpoints are automatically cleaned up based on configurable retention count.

Pretrained weight adaptation: load_pretrained performs extensive weight adaptation logic: stripping student/teacher prefixes from distillation checkpoints, remapping SIM-format keys, bicubic interpolation of relative position bias tables and absolute position embeddings when resolutions differ between pretrained and target model, and ImageNet-22K to 1K classifier head mapping.

Mixed-precision support: NativeScalerWithGradNormCount wraps torch.cuda.amp.GradScaler with gradient clipping and norm computation integrated into the scale-unscale-step cycle.

Distributed utilities: reduce_tensor performs all-reduce averaging across GPUs, and auto_resume_helper finds the latest checkpoint by file modification time.

Metrics: MyAverageMeter tracks running mean and standard deviation over a configurable sliding window, used for gradient norm monitoring during training.

Usage

Use the functions in this module from the main classification training script. Import save_checkpoint and load_checkpoint for training state persistence, load_pretrained for fine-tuning from various upstream weight sources, and NativeScalerWithGradNormCount for mixed-precision training with gradient norm tracking.

Code Reference

Source Location

Signature

def load_checkpoint(config, model, optimizer, lr_scheduler, scaler, logger):
    ...

def load_ema_checkpoint(config, model_ema, logger):
    ...

def load_pretrained(config, model, logger):
    ...

def save_checkpoint(config, epoch, model, max_accuracy, optimizer,
                    lr_scheduler, scaler, logger, model_ema=None,
                    max_accuracy_ema=None, ema_decay=None, best=None):
    ...

class NativeScalerWithGradNormCount:
    def __call__(self, loss, optimizer, clip_grad=None, parameters=None,
                 create_graph=False, update_grad=True):
        ...

def auto_resume_helper(output_dir):
    ...

def reduce_tensor(tensor):
    ...

def get_grad_norm(parameters, norm_type=2):
    ...

class MyAverageMeter:
    def __init__(self, max_len=-1):
        ...
    def update(self, val):
        ...

Import

from utils import MyAverageMeter
from utils import NativeScalerWithGradNormCount as NativeScaler
from utils import (auto_resume_helper, get_grad_norm, load_checkpoint,
                   load_ema_checkpoint, load_pretrained, reduce_tensor,
                   save_checkpoint)

I/O Contract

Inputs

Name Type Required Description
config CfgNode Yes Configuration object with model/training parameters
model nn.Module Yes The model whose state is being saved/loaded
optimizer Optimizer Yes Optimizer whose state is serialized with checkpoints
lr_scheduler LRScheduler Yes Learning rate scheduler state
scaler GradScaler No AMP loss scaler state (when AMP is enabled)
logger Logger Yes Logger for status messages

Outputs

Name Type Description
max_accuracy float Best validation accuracy loaded from checkpoint (from load_checkpoint)
checkpoint file .pth Serialized training state dictionary (from save_checkpoint)
resume_file str or None Path to latest checkpoint file (from auto_resume_helper)
grad_norm float L2 norm of gradients (from NativeScalerWithGradNormCount or get_grad_norm)

Usage Examples

Basic Usage

from utils import save_checkpoint, load_checkpoint, NativeScalerWithGradNormCount

# Initialize loss scaler for mixed-precision training
loss_scaler = NativeScalerWithGradNormCount()

# Resume from checkpoint
max_accuracy = load_checkpoint(config, model, optimizer, lr_scheduler, loss_scaler, logger)

# Save checkpoint after training epoch
save_checkpoint(config, epoch, model, max_accuracy, optimizer, lr_scheduler,
                loss_scaler, logger, model_ema=model_ema, best='best')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment