Principle:VainF Torch Pruning Growing Regularization

Metadata

Field	Value
Papers	Neural Pruning via Growing Regularization (Wang et al., 2021); DepGraph: Towards Any Structural Pruning
Domains	Deep_Learning, Regularization, Pruning
Last Updated	2026-02-08 00:00 GMT

Overview

A progressive regularization strategy that gradually increases the regularization penalty on low-importance channels over training epochs. Rather than applying a fixed regularization coefficient uniformly, Growing Regularization adapts per-channel penalties incrementally, allowing the network to smoothly transition from a dense to a sparse configuration without abrupt capacity loss.

Description

Growing Regularization starts with a small regularization coefficient and incrementally increases it by delta_reg each epoch, scaled by the standardized importance of each channel. Unlike fixed regularization, this approach allows the model to adapt gradually, avoiding abrupt capacity loss. Each channel maintains its own regularization state that grows over time:

reg_c += delta_reg * normalized_importance_c

The standardized importance is computed by normalizing raw importance scores to the range [0, 1]:

standardized_imp = (imp.max() - imp) / (imp.max() - imp.min() + 1e-8)

This means that channels with lower importance receive higher regularization increments, accelerating their decay toward zero. Channels with higher importance receive smaller penalties, preserving their contribution to the network. Over successive epochs, the accumulated regularization drives unimportant channels to near-zero magnitude, making them safe to prune with minimal accuracy loss.

The key distinction from static L2 regularization is that the penalty is:

Per-channel -- each channel has its own regularization coefficient
Growing -- the coefficient increases monotonically over training
Importance-weighted -- low-importance channels are penalized more aggressively

Usage

Growing Regularization is preferred in the following scenarios:

When gradual sparsification is preferred over sudden regularization
Particularly effective for larger models where aggressive regularization can cause training instability
When the practitioner wants to maintain training accuracy during the sparsification process
For iterative pruning pipelines where multiple rounds of regularization and pruning are applied
When fine-grained control over the regularization schedule is needed (via reg and delta_reg hyperparameters)

Theoretical Basis

The per-channel growing regularization follows this update rule:

Per-channel regularization update:

{reg}_{c} (t) = {reg}_{c} (t - 1) + δ \cdot {\tilde{i}}_{c}

where ${\tilde{i}}_{c}$ is the standardized importance of channel $c$ , computed as:

{\tilde{i}}_{c} = \frac{i_{\max} - i_{c}}{i_{\max} - i_{\min} + ϵ}

Gradient modification during training:

\nabla w + = {reg}_{c} \cdot w

This is equivalent to adding a weighted L2 penalty to the loss function, but with a per-channel coefficient that grows over time. The growing schedule ensures a smooth transition from dense to sparse:

At early epochs, ${reg}_{c}$ is small for all channels, so the model trains nearly unregularized
Over time, unimportant channels accumulate large penalties, driving their weights toward zero
Important channels accumulate small penalties, preserving model capacity where it matters most

The standardization to [0, 1] ensures that the relative importance ordering determines the regularization strength, independent of the absolute scale of the importance scores.

Related Pages

Implementation:VainF_Torch_Pruning_GrowingRegPruner

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment