Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:OpenGVLab InternVL Optimizer Builder

From Leeroopedia
Revision as of 16:15, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/OpenGVLab_InternVL_Optimizer_Builder.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Optimization, Training, Classification
Last Updated 2026-02-07 14:00 GMT

Overview

Constructs optimizers (SGD, AdamW, or SGD for linear probing) with per-parameter weight decay and learning rate settings, including optional ZeRO memory optimization and layer-wise LR decay.

Description

The build_optimizer function first calls set_weight_decay_and_lr to create a list of parameter groups with customized weight decay and learning rates:

  • Weight decay exclusions -- Parameters with 1D shape (norms), names ending with .bias, or those in the model's no_weight_decay() skip list receive zero weight decay.
  • Layer-wise LR decay -- When enabled, the model's lr_decay_keywords method returns per-layer LR ratios that are multiplied with the base LR for each parameter.
  • DCN LR multiplier -- Deformable convolution parameters (offset, attention_weights, center_feature_scale_proj, alpha_beta) can receive a separate LR multiplier.
  • Backbone freezing -- Specified backbone levels can have their requires_grad set to False.

After parameter grouping, the function constructs the optimizer:

  • AdamW or SGD with the configured hyperparameters (eps, betas, momentum)
  • sgd_linear_probing variant with momentum=0.9, no nesterov, and zero weight decay
  • ZeroRedundancyOptimizer wrapping for memory-efficient distributed training, with a workaround for a pre-PyTorch-1.12 API limitation

Usage

Use this module when building the optimizer for classification training. Call build_optimizer(config, model) to obtain an optimizer with properly configured per-parameter weight decay and LR settings.

Code Reference

Source Location

Signature

def build_optimizer(config, model) -> torch.optim.Optimizer: ...

def set_weight_decay_and_lr(
    model, weight_decay, base_lr, skip_list=(), skip_keywords=(),
    lr_layer_decay=None, lr_layer_decay_ratio=None,
    freeze_backbone=None, dcn_lr_mul=None, layerwise_lr=True
) -> list: ...

def check_keywords_in_name(name, keywords=()) -> bool: ...
def check_keywords_in_dict(name, keywords_dict) -> Optional[float]: ...

Import

from classification.optimizer import build_optimizer

I/O Contract

Inputs

Name Type Required Description
config CfgNode Yes Configuration object with TRAIN.WEIGHT_DECAY, TRAIN.BASE_LR, TRAIN.OPTIMIZER settings, and TRAIN.LR_LAYER_DECAY settings
model nn.Module Yes The model whose parameters will be optimized; may provide no_weight_decay() and lr_decay_keywords() methods

Outputs

Name Type Description
optimizer torch.optim.Optimizer An optimizer instance (AdamW, SGD, or ZeroRedundancyOptimizer) with per-parameter groups

Usage Examples

Basic Usage

from classification.optimizer import build_optimizer

optimizer = build_optimizer(config, model)

With ZeRO

# Enable ZeRO in config
config.TRAIN.OPTIMIZER.USE_ZERO = True
optimizer = build_optimizer(config, model)
# optimizer is a ZeroRedundancyOptimizer wrapping AdamW or SGD

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment