Heuristic:VainF Torch Pruning Over Pruning Prevention
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Model_Compression, Debugging |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
Built-in safeguards prevent layers from being pruned below a minimum channel threshold or to a single channel, which would collapse the network.
Description
Structural pruning can potentially remove so many channels from a layer that the network becomes degenerate or crashes. Torch-Pruning implements two safety guards in BasePruner:
- max_pruning_ratio guard: Before pruning a group, the pruner checks whether the target layer's current channel count has already dropped below
initial_channels * (1 - max_pruning_ratio). If so, the group is skipped entirely. - Single-channel guard: The pruner never prunes a layer down to 1 channel (
layer_out_ch == 1), regardless of the importance scores.
Additionally, when iterative_steps is used, the pruner caps execution at the specified step count and emits a warning if the user tries to prune beyond that.
Usage
Use this heuristic to understand why some layers are skipped during pruning. If you observe that the pruner is not reaching your target pruning ratio, it may be because over-pruning guards are being triggered. You can:
- Increase
max_pruning_ratio(default is 1.0, meaning no per-layer cap) - Use
global_pruning=Trueto redistribute pruning across layers - Use isomorphic pruning to avoid over-pruning specific structural blocks
The Insight (Rule of Thumb)
- Action: Configure
max_pruning_ratioto control the maximum fraction of channels any single layer can lose. Setiterative_stepsto spread pruning over multiple rounds. - Value: Default
max_pruning_ratio=1.0(no per-layer limit); the single-channel guard is always active.iterative_steps=1by default. - Trade-off: Setting
max_pruning_ratiotoo low prevents aggressive pruning of unimportant layers; setting it too high risks degenerate layers with very few channels.
Reasoning
A layer with 0 channels cannot process data, and a layer with 1 channel produces a scalar feature map that often breaks downstream operations like batch normalization or grouped convolutions. The single-channel guard avoids this universally. The max_pruning_ratio guard provides a configurable ceiling that can be tightened for safety or loosened for aggressive compression.
Isomorphic pruning (ECCV 2024) provides an additional safeguard by grouping structurally identical layers and pruning them uniformly, preventing the global ranking mode from over-pruning specific layers while under-pruning others.
Code Evidence
Over-pruning check from torch_pruning/pruner/algorithms/base_pruner.py:357-374:
if self.DG.is_out_channel_pruning_fn(pruning_fn):
layer_out_ch = self.DG.get_out_channels(module)
if layer_out_ch is None:
continue
if layer_out_ch < self.layer_init_out_ch[module] * (
1 - self.max_pruning_ratio
) or layer_out_ch == 1:
return False
elif self.DG.is_in_channel_pruning_fn(pruning_fn):
layer_in_ch = self.DG.get_in_channels(module)
if layer_in_ch is None:
continue
if layer_in_ch < self.layer_init_in_ch[module] * (
1 - self.max_pruning_ratio
) or layer_in_ch == 1:
return False
Iterative step overflow warning from torch_pruning/pruner/algorithms/base_pruner.py:437-440:
if self.current_step > self.iterative_steps:
warnings.warn(
"Pruning exceed the maximum iterative steps, no pruning will be performed.")
return
Linear iterative schedule from torch_pruning/pruner/algorithms/base_pruner.py:149-157:
# The pruner will prune the model iteratively for several steps to achieve the target pruning ratio
# E.g., if iterative_steps=5, pruning_ratio=0.5, the pruning ratio of each step will be [0.1, 0.2, 0.3, 0.4, 0.5]
self.iterative_steps = iterative_steps
self.iterative_pruning_ratio_scheduler = iterative_pruning_ratio_scheduler
self.current_step = 0
self.per_step_pruning_ratio = self.iterative_pruning_ratio_scheduler(
self.pruning_ratio, self.iterative_steps
)