Implementation:OpenGVLab InternVL CustomLayerDecayOptimizerConstructor
| Knowledge Sources | |
|---|---|
| Domains | Training, Optimization, Segmentation |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Custom optimizer constructor that applies layer-wise learning rate decay for Vision Transformer backbones, ensuring earlier (lower) layers receive smaller learning rates during fine-tuning.
Description
CustomLayerDecayOptimizerConstructor extends MMCV's DefaultOptimizerConstructor and is registered with the OPTIMIZER_BUILDERS registry. The add_params() method iterates over all trainable parameters and assigns each to a layer group using the helper function get_num_layer_for_vit(), which maps parameter names to layer IDs: embedding-related parameters (cls_token, pos_embed, patch_embed) are assigned to layer 0, transformer block parameters (backbone.blocks.N or backbone.layers.N) are assigned to layer N+1, and all other parameters (e.g., decode head) are assigned to the last layer (num_max_layer - 1). The function also handles cb_modules prefixes and levels/layers naming variations. Each layer group's learning rate is scaled by layer_decay_rate^(num_layers - layer_id - 1), so the earliest layers get the smallest LR (strongest decay). Parameters are also separated into decay (weight decay applied) and no_decay (bias, 1D params, special tokens) groups. On rank 0, a JSON summary of all parameter groups is logged.
Usage
Use this optimizer constructor in MMSegmentation configs when fine-tuning deep ViT backbones (like InternViT-6B) for semantic segmentation, to prevent catastrophic forgetting of pretrained features in early layers.
Code Reference
Source Location
- Repository: OpenGVLab_InternVL
- File: segmentation/mmcv_custom/layer_decay_optimizer_constructor.py
- Lines: 1-98
Signature
def get_num_layer_for_vit(var_name, num_max_layer): ...
@OPTIMIZER_BUILDERS.register_module()
class CustomLayerDecayOptimizerConstructor(DefaultOptimizerConstructor):
def add_params(self, params, module, prefix='', is_dcn_module=None): ...
Import
from mmcv_custom.layer_decay_optimizer_constructor import CustomLayerDecayOptimizerConstructor
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| params | list[dict] | Yes | List of param groups to extend (modified in place) |
| module | nn.Module | Yes | The model module whose parameters to process |
| paramwise_cfg.num_layers | int | Yes | Number of transformer layers in the backbone |
| paramwise_cfg.layer_decay_rate | float | Yes | Decay rate per layer (e.g., 0.9 means 10% decay per layer) |
Outputs
| Name | Type | Description |
|---|---|---|
| params | list[dict] | Extended with per-layer parameter groups containing scaled lr, weight_decay, and params |
Usage Examples
Basic Usage
# In MMSegmentation config:
optimizer = dict(
constructor='CustomLayerDecayOptimizerConstructor',
type='AdamW',
lr=2e-5,
weight_decay=0.05,
paramwise_cfg=dict(
num_layers=48, # Number of ViT layers
layer_decay_rate=0.9 # 10% LR decay per layer
)
)