Implementation:VainF Torch Pruning GroupMagnitudeImportance
Overview
Concrete tool for magnitude-based group importance estimation provided by Torch-Pruning.
Description
GroupMagnitudeImportance computes per-channel importance scores using Lp norms of the weight tensors associated with each channel. It is the primary magnitude-based importance estimator in the Torch-Pruning library, designed to work with the framework's DependencyGraph and Group abstractions for fully automated structural pruning.
The class supports multiple variants of magnitude importance:
- Standard L1/L2 norms: Controlled by the
pparameter. Settingp=1yields L1-norm importance;p=2(the default) yields L2-norm importance. - Batch normalization scaling factor extraction: By restricting
target_typesto only_BatchNormlayers, the estimator extracts BN gamma values as importance scores, implementing the network slimming approach. - LAMP normalization: Setting
normalizer='lamp'enables Layer-Adaptive Magnitude-based Pruning, which uses a cumulative-sum normalization scheme that adapts per-layer sparsity ratios automatically. - Group reduction strategies: The
group_reductionparameter controls how importance scores from multiple coupled layers within a dependency group are aggregated into a single per-channel score. Supported strategies include"mean","sum","max","prod","first", and"gate".
The class operates on Group objects produced by the DependencyGraph. Each Group represents a set of coupled pruning operations that must be executed together to maintain architectural consistency. When called, GroupMagnitudeImportance iterates over every dependency in the group, computes a local importance score for each parameterized layer (Conv, Linear, BatchNorm, LayerNorm), and then reduces and normalizes the scores to produce a single 1-D importance tensor.
Internally, the computation proceeds as follows:
- For each layer in the group, the weight slice corresponding to the pruning indices is extracted and flattened.
- The element-wise absolute value is raised to the power
p, and the result is summed across the non-channel dimensions to yield a local per-channel importance score. - All local scores are collected and aggregated via the chosen group reduction strategy (e.g., scatter-add for
"mean"/"sum"). - The aggregated scores are normalized according to the chosen normalization scheme.
The class also handles special cases such as transposed convolutions, group convolutions (where importance must be repeated across groups), and layers without affine parameters (which are silently skipped).
Usage
Use GroupMagnitudeImportance when you need a gradient-free importance criterion for structural pruning. It is the recommended default importance estimator for most pruning tasks in Torch-Pruning because:
- It does not require a forward or backward pass through the model.
- It is fast to compute, even for large models.
- It produces reasonable pruning decisions for moderate pruning ratios.
Typical workflow:
- Build a
DependencyGraphfrom the model and example inputs. - Obtain a
Groupby specifying a layer and pruning function. - Instantiate
GroupMagnitudeImportancewith the desired configuration. - Call the instance with the group to obtain per-channel importance scores.
- Use the scores to select which channels to prune.
Code Reference
Source
File: torch_pruning/pruner/importance.py, Lines 58-269
Class Signature
class GroupMagnitudeImportance(Importance):
def __init__(self,
p: int = 2,
group_reduction: str = "mean",
normalizer: str = 'mean',
bias: bool = False,
target_types: list = [nn.modules.conv._ConvNd, nn.Linear,
nn.modules.batchnorm._BatchNorm, nn.LayerNorm]):
Import
from torch_pruning.pruner.importance import GroupMagnitudeImportance
or equivalently:
import torch_pruning as tp
tp.importance.GroupMagnitudeImportance
I/O Contract
Constructor Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
p |
int | No | 2 | Norm degree for importance calculation. Use 1 for L1-norm, 2 for L2-norm, etc. |
group_reduction |
str | No | "mean" | Reduction strategy for aggregating importance across coupled layers in a group. Options: "mean", "sum", "max", "prod", "first", "gate".
|
normalizer |
str | No | "mean" | Normalization scheme applied after group reduction. Options: "mean", "sum", "standarization", "max", "gaussian", "lamp".
|
bias |
bool | No | False | Whether to include bias parameters in importance computation. |
target_types |
list | No | [_ConvNd, Linear, _BatchNorm, LayerNorm] | List of layer types to consider when computing importance. Layers not matching any type in this list are skipped. |
__call__ Input
| Parameter | Type | Required | Description |
|---|---|---|---|
group |
Group | Yes | A dependency group obtained from the DependencyGraph, representing a set of coupled pruning operations. |
Output
Returns: A 1-D torch.Tensor of per-channel importance scores. The length of the tensor equals the number of channels in the root pruning operation of the group. Returns None if no parameterized layers are found in the group.
Usage Examples
Example 1: Basic Usage with DependencyGraph
import torch
import torch.nn as nn
import torch_pruning as tp
# Build a simple model
model = nn.Sequential(
nn.Conv2d(3, 64, 3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.Conv2d(64, 128, 3, padding=1),
)
# Build the dependency graph
DG = tp.DependencyGraph().build_dependency(
model, example_inputs=torch.randn(1, 3, 224, 224)
)
# Get a pruning group for output channels of the first conv layer
group = DG.get_pruning_group(
model[0], tp.prune_conv_out_channels, idxs=[0, 1, 2, 3]
)
# Compute importance using default L2-norm
imp = tp.importance.GroupMagnitudeImportance(p=2)
scores = imp(group)
# scores is a 1-D tensor of length 4, one score per channel
print(scores)
Example 2: L1-Norm Variant
import torch_pruning as tp
# L1-norm importance with no normalization, using only the first layer
imp = tp.importance.GroupMagnitudeImportance(
p=1,
normalizer=None,
group_reduction="first"
)
scores = imp(group)
Example 3: BN Scaling Factor Variant
import torch.nn as nn
import torch_pruning as tp
# Use only BatchNorm scaling factors as importance
imp = tp.importance.GroupMagnitudeImportance(
p=1,
normalizer=None,
group_reduction="mean",
target_types=[nn.modules.batchnorm._BatchNorm]
)
scores = imp(group)
This variant is equivalent to the BNScaleImportance class, which implements the Network Slimming approach from Liu et al., 2017.