Implementation:OpenGVLab InternVL Optimizer Builder
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Training, Classification |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Constructs optimizers (SGD, AdamW, or SGD for linear probing) with per-parameter weight decay and learning rate settings, including optional ZeRO memory optimization and layer-wise LR decay.
Description
The build_optimizer function first calls set_weight_decay_and_lr to create a list of parameter groups with customized weight decay and learning rates:
- Weight decay exclusions -- Parameters with 1D shape (norms), names ending with .bias, or those in the model's no_weight_decay() skip list receive zero weight decay.
- Layer-wise LR decay -- When enabled, the model's lr_decay_keywords method returns per-layer LR ratios that are multiplied with the base LR for each parameter.
- DCN LR multiplier -- Deformable convolution parameters (offset, attention_weights, center_feature_scale_proj, alpha_beta) can receive a separate LR multiplier.
- Backbone freezing -- Specified backbone levels can have their requires_grad set to False.
After parameter grouping, the function constructs the optimizer:
- AdamW or SGD with the configured hyperparameters (eps, betas, momentum)
- sgd_linear_probing variant with momentum=0.9, no nesterov, and zero weight decay
- ZeroRedundancyOptimizer wrapping for memory-efficient distributed training, with a workaround for a pre-PyTorch-1.12 API limitation
Usage
Use this module when building the optimizer for classification training. Call build_optimizer(config, model) to obtain an optimizer with properly configured per-parameter weight decay and LR settings.
Code Reference
Source Location
- Repository: OpenGVLab_InternVL
- File: classification/optimizer.py
- Lines: 1-164
Signature
def build_optimizer(config, model) -> torch.optim.Optimizer: ...
def set_weight_decay_and_lr(
model, weight_decay, base_lr, skip_list=(), skip_keywords=(),
lr_layer_decay=None, lr_layer_decay_ratio=None,
freeze_backbone=None, dcn_lr_mul=None, layerwise_lr=True
) -> list: ...
def check_keywords_in_name(name, keywords=()) -> bool: ...
def check_keywords_in_dict(name, keywords_dict) -> Optional[float]: ...
Import
from classification.optimizer import build_optimizer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | CfgNode | Yes | Configuration object with TRAIN.WEIGHT_DECAY, TRAIN.BASE_LR, TRAIN.OPTIMIZER settings, and TRAIN.LR_LAYER_DECAY settings |
| model | nn.Module | Yes | The model whose parameters will be optimized; may provide no_weight_decay() and lr_decay_keywords() methods |
Outputs
| Name | Type | Description |
|---|---|---|
| optimizer | torch.optim.Optimizer | An optimizer instance (AdamW, SGD, or ZeroRedundancyOptimizer) with per-parameter groups |
Usage Examples
Basic Usage
from classification.optimizer import build_optimizer
optimizer = build_optimizer(config, model)
With ZeRO
# Enable ZeRO in config
config.TRAIN.OPTIMIZER.USE_ZERO = True
optimizer = build_optimizer(config, model)
# optimizer is a ZeroRedundancyOptimizer wrapping AdamW or SGD