Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:VainF Torch Pruning Pruning Ratio vs Parameter Ratio

From Leeroopedia



Knowledge Sources
Domains Deep_Learning, Model_Compression
Last Updated 2026-02-08 12:00 GMT

Overview

The channel pruning ratio is NOT the same as the parameter reduction ratio; removing p fraction of channels reduces parameters by roughly 1-(1-p)^2.

Description

In structural pruning, removing output channels from a layer simultaneously removes the corresponding output dimensions from that layer's weight tensor AND the corresponding input dimensions from downstream layers. This means both the input and output channel counts of affected weight tensors are reduced. As a result, the actual parameter reduction is much larger than the channel pruning ratio suggests.

This quadratic relationship is a frequent source of confusion: setting pruning_ratio=0.5 does NOT remove 50% of parameters -- it removes approximately 75%.

Usage

Use this heuristic when choosing a target pruning ratio for any pruning workflow. Without accounting for the quadratic effect, users commonly over-prune their models, leading to unacceptable accuracy loss.

The Insight (Rule of Thumb)

  • Action: Convert between channel pruning ratio and parameter pruning ratio using the formula: parameter_reduction = 1 - (1 - channel_ratio)^2
  • Value: To remove ~50% of parameters, use pruning_ratio=0.30 (since 1 - (1 - 0.3)^2 = 0.51)
  • Trade-off: None -- this is a mathematical relationship, not a configurable choice. But ignoring it leads to accidentally removing far more parameters than intended.

Quick reference table:

Channel Ratio (p) Approx. Parameter Reduction
0.10 ~19%
0.20 ~36%
0.30 ~51%
0.40 ~64%
0.50 ~75%
0.60 ~84%

Reasoning

Consider a convolution layer with weight shape [C_out, C_in, K, K]. Removing a fraction p of output channels reduces C_out to C_out * (1-p). Because the downstream layer's input channels are also reduced, C_in of the next layer also becomes C_in * (1-p). The new parameter count is proportional to (1-p)^2 times the original, so the parameter reduction is 1 - (1-p)^2.

This is explicitly documented in the README as an important note for users.

Code Evidence

From README.md (lines 160-164):

# The pruning ratio refers to the pruning ratio of channels/dims.
# Since both in & out dims will be removed by p,
# the actual parameter_pruning_ratio will be roughly 1-(1-p)^2.
# To remove 50% of parameters, you may use pruning_ratio=0.30 instead,
# which leads to the actual parameter pruning ratio of 1-(1-0.3)^2=0.51

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment