Workflow:VainF Torch Pruning Object Detection Pruning

Knowledge Sources	Torch-Pruning DepGraph: Towards Any Structural Pruning
Domains	Model_Compression, Structural_Pruning, Object_Detection
Last Updated	2026-02-07 23:30 GMT

Overview

End-to-end iterative pruning and fine-tuning pipeline for YOLO object detection models (YOLOv5, YOLOv7, YOLOv8), progressively compressing the model while preserving detection accuracy.

Description

This workflow implements an iterative prune-then-finetune loop specifically designed for object detection models. Unlike single-shot pruning used for classifiers, detection models require careful iterative pruning with fine-tuning between each step to maintain detection quality (mAP). The workflow handles YOLO-specific challenges including replacing C2f modules with pruning-compatible variants, ignoring detection heads, managing progressive pruning ratios computed from a target total pruning rate, and integrating with the Ultralytics training pipeline for fine-tuning. Each iteration prunes a fraction of channels, evaluates mAP before and after fine-tuning, and includes an early stopping mechanism based on maximum allowed mAP drop.

Usage

Execute this workflow when you need to compress a YOLO detection model for deployment on edge devices, mobile platforms, or real-time inference scenarios. This is appropriate when you have a trained YOLOv5/v7/v8 model and need to reduce its computational cost while maintaining acceptable detection accuracy.

Execution Steps

Step 1: Load trained YOLO model and prepare architecture

Load the pretrained YOLO model from a checkpoint. For YOLOv8, replace C2f modules with a pruning-compatible C2f_v2 variant that splits the initial convolution into two separate convolutions, making the architecture amenable to structural pruning. Re-initialize batch normalization parameters and enable gradients for all parameters.

Key considerations:

The C2f module uses chunk operations that are difficult to prune; C2f_v2 replaces these with explicit separate convolutions
Weight transfer from C2f to C2f_v2 must correctly split the first convolution's weights by channel
Initialize BN epsilon, momentum, and ReLU inplace settings after module replacement

Step 2: Establish baseline metrics

Run validation on the full (unpruned) model to establish baseline mAP, MACs, and parameter count. These serve as reference points for measuring compression progress and quality degradation across pruning iterations.

Key considerations:

Use the same validation dataset and settings that will be used for post-pruning evaluation
Record baseline_macs and baseline_nparams for computing compression ratios

Step 3: Compute per-iteration pruning ratio

Calculate the pruning ratio for each iteration such that after all iterations, the total pruning matches the target rate. The formula ensures equal proportional pruning at each step: ratio_per_step = 1 - (1 - target_rate)^(1/num_steps).

Pseudocode:

per_step_ratio = 1 - (1 - target_pruning_rate) ^ (1 / iterative_steps)

Step 4: Execute iterative prune-finetune loop

For each iteration: create a GroupNormPruner with the per-step pruning ratio, ignore detection head layers (Detect modules), execute pruning, validate mAP on the pruned (not yet fine-tuned) model, fine-tune for a configured number of epochs using the Ultralytics training pipeline, then validate again to measure recovered mAP. Track all metrics across iterations.

Key considerations:

Ignore Detect modules to preserve the detection output structure
Fine-tuning epochs per iteration are typically shorter than original training (e.g., 10 epochs)
Delete the pruner after each iteration to free memory
Early stopping if mAP drops below the allowed maximum drop threshold

Step 5: Export and visualize results

After completing all pruning iterations (or early stopping), export the final pruned model to ONNX format for deployment. Generate a performance visualization graph showing mAP recovery, pruned mAP, and MACs reduction across all pruning steps.

Key considerations:

The performance graph plots recovered mAP, pruned mAP (before fine-tuning), and MACs on a dual-axis chart
ONNX export enables deployment on various inference runtimes
Compare final model size and speed against the baseline

Execution Diagram

GitHub URL

Workflow Repository