Implementation:OpenGVLab InternVL Optimizer Builder

Knowledge Sources	OpenGVLab_InternVL
Domains	Optimization, Training, Classification
Last Updated	2026-02-07 14:00 GMT

Overview

Constructs optimizers (SGD, AdamW, or SGD for linear probing) with per-parameter weight decay and learning rate settings, including optional ZeRO memory optimization and layer-wise LR decay.

Description

The build_optimizer function first calls set_weight_decay_and_lr to create a list of parameter groups with customized weight decay and learning rates:

Weight decay exclusions -- Parameters with 1D shape (norms), names ending with .bias, or those in the model's no_weight_decay() skip list receive zero weight decay.
Layer-wise LR decay -- When enabled, the model's lr_decay_keywords method returns per-layer LR ratios that are multiplied with the base LR for each parameter.
DCN LR multiplier -- Deformable convolution parameters (offset, attention_weights, center_feature_scale_proj, alpha_beta) can receive a separate LR multiplier.
Backbone freezing -- Specified backbone levels can have their requires_grad set to False.

After parameter grouping, the function constructs the optimizer:

AdamW or SGD with the configured hyperparameters (eps, betas, momentum)
sgd_linear_probing variant with momentum=0.9, no nesterov, and zero weight decay
ZeroRedundancyOptimizer wrapping for memory-efficient distributed training, with a workaround for a pre-PyTorch-1.12 API limitation

Usage

Use this module when building the optimizer for classification training. Call build_optimizer(config, model) to obtain an optimizer with properly configured per-parameter weight decay and LR settings.

Code Reference

Source Location

Repository: OpenGVLab_InternVL
File: classification/optimizer.py
Lines: 1-164

Signature

def build_optimizer(config, model) -> torch.optim.Optimizer: ...

def set_weight_decay_and_lr(
    model, weight_decay, base_lr, skip_list=(), skip_keywords=(),
    lr_layer_decay=None, lr_layer_decay_ratio=None,
    freeze_backbone=None, dcn_lr_mul=None, layerwise_lr=True
) -> list: ...

def check_keywords_in_name(name, keywords=()) -> bool: ...
def check_keywords_in_dict(name, keywords_dict) -> Optional[float]: ...

Import

from classification.optimizer import build_optimizer

I/O Contract

Inputs

Name	Type	Required	Description
config	CfgNode	Yes	Configuration object with TRAIN.WEIGHT_DECAY, TRAIN.BASE_LR, TRAIN.OPTIMIZER settings, and TRAIN.LR_LAYER_DECAY settings
model	nn.Module	Yes	The model whose parameters will be optimized; may provide no_weight_decay() and lr_decay_keywords() methods

Outputs

Name	Type	Description
optimizer	torch.optim.Optimizer	An optimizer instance (AdamW, SGD, or ZeroRedundancyOptimizer) with per-parameter groups

Usage Examples

Basic Usage

from classification.optimizer import build_optimizer

optimizer = build_optimizer(config, model)

With ZeRO

# Enable ZeRO in config
config.TRAIN.OPTIMIZER.USE_ZERO = True
optimizer = build_optimizer(config, model)
# optimizer is a ZeroRedundancyOptimizer wrapping AdamW or SGD

Related Pages

Principle:OpenGVLab_InternVL_Optimizer_Construction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment