Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:VainF Torch Pruning Measure Latency

From Leeroopedia


Metadata

Field Value
Source Torch-Pruning
Domains Deep_Learning, Benchmarking
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for measuring GPU inference latency provided by Torch-Pruning.

Description

measure_latency runs the model in eval mode with warmup iterations, then measures inference time using torch.cuda.Event timing for GPU-accurate measurements. Returns mean and standard deviation of latency in milliseconds.

Code Reference

  • Source: torch_pruning/utils/benchmark.py, Lines 6-43
  • Signature:
def measure_latency(model, example_inputs, repeat=300, warmup=50, run_fn=None):
    """Measure model inference latency.

    Returns:
        Tuple of (mean_latency_ms, std_latency_ms).
    """
  • Import:
import torch_pruning as tp
tp.utils.benchmark.measure_latency

I/O Contract

Inputs

Parameter Type Required Default
model nn.Module Yes
example_inputs Tensor Yes
repeat int No 300
warmup int No 50
run_fn Callable No None

Outputs

  • (mean_latency_ms: float, std_latency_ms: float)

Usage Examples

import torch
import torch.nn as nn
import torch_pruning as tp
from torch_pruning.utils.benchmark import measure_latency

# Build a simple model and move to GPU
model = nn.Sequential(
    nn.Conv2d(3, 64, 3, padding=1),
    nn.BatchNorm2d(64),
    nn.ReLU(),
    nn.Conv2d(64, 128, 3, padding=1),
).cuda().eval()

example_inputs = torch.randn(1, 3, 224, 224).cuda()

# Measure latency BEFORE pruning
mean_before, std_before = measure_latency(model, example_inputs)
print(f"Before pruning: {mean_before:.2f} +/- {std_before:.2f} ms")

# ... apply pruning ...

# Measure latency AFTER pruning
mean_after, std_after = measure_latency(model, example_inputs)
print(f"After pruning:  {mean_after:.2f} +/- {std_after:.2f} ms")
print(f"Speedup: {mean_before / mean_after:.2f}x")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment