Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba MNN BenchmarkExprModels

From Leeroopedia


Property Value
Page Type Implementation
Repository Alibaba MNN
Source File benchmark/benchmarkExprModels.cpp (232 lines)
Language C++
Domains Benchmarking, Inference
Date 2026-02-10

Overview

BenchmarkExprModels is a benchmark tool that measures inference latency for expression-based neural network models across multiple hardware backends. It supports benchmarking well-known architectures including MobileNet, ResNet, GoogLeNet, SqueezeNet, and ShuffleNet using MNN's expression API (VARP-based model construction).

The tool runs each model through a configurable number of inference loops, collecting per-iteration timing data to compute minimum, maximum, and average latency statistics. It supports selection of different compute backends (CPU, Vulkan, OpenCL, Metal) and thread count configuration, making it suitable for cross-platform performance evaluation and regression testing.

Code Reference

Key Functions

Function Location Description
main() benchmark/benchmarkExprModels.cpp:L149+ Entry point that parses CLI arguments, configures the backend and scheduling, iterates over model architectures, and reports results.
runNet(VARP, ScheduleConfig&, int) benchmark/benchmarkExprModels.cpp:L84-126 Executes a single model benchmark: creates a session from the expression variable, runs the specified number of inference loops, and records per-iteration timing.
displayStats() benchmark/benchmarkExprModels.cpp:L73-82 Formats and prints the collected timing statistics (min, max, average) for a completed benchmark run.
getTimeInUs() benchmark/benchmarkExprModels.cpp:L28-43 Platform-abstracted microsecond timer using gettimeofday or equivalent, providing the timing primitive for latency measurements.

Function Signatures

// benchmark/benchmarkExprModels.cpp:L84
static void runNet(VARP input, ScheduleConfig& config, int loopCount);

// benchmark/benchmarkExprModels.cpp:L73
static void displayStats(const std::string& name,
                          const std::vector<float>& times);

// benchmark/benchmarkExprModels.cpp:L28
static inline uint64_t getTimeInUs();

I/O Contract

Inputs

Parameter Type Default Description
model string (CLI arg 1) (required) Name of the model to benchmark (e.g., MobileNet, ResNet, all).
loop int (CLI arg 2) 10 Number of inference iterations for timing.
forward_type int (CLI arg 3) 0 (CPU) Backend selection: 0 = CPU, 3 = OpenCL, 7 = Vulkan, 1 = Metal.
threads int (CLI arg 4) 4 Number of threads for CPU backend execution.

Outputs

Output Format Description
Latency statistics stdout text Per-model timing report showing min, max, and avg inference time in milliseconds.

Example Output

MobileNetV2:
  Min:  12.34 ms
  Max:  15.67 ms
  Avg:  13.21 ms
ResNet18:
  Min:  28.91 ms
  Max:  31.44 ms
  Avg:  29.87 ms

Usage Examples

Benchmark All Models on CPU

./benchmarkExprModels all 100 0 4

Runs all supported models for 100 iterations each on CPU with 4 threads.

Benchmark MobileNet on Vulkan

./benchmarkExprModels MobileNet 50 7 1

Runs MobileNet for 50 iterations on the Vulkan backend.

Benchmark on OpenCL

./benchmarkExprModels ResNet 100 3 1

Runs ResNet for 100 iterations on the OpenCL backend.

Supported Models

Model Architecture Typical Use Case
MobileNetV1/V2 Depthwise separable convolutions Mobile/edge inference
ResNet-18/50 Residual connections General-purpose classification
GoogLeNet Inception modules Image classification
SqueezeNet Fire modules Compact models
ShuffleNet Channel shuffle operations Efficient mobile inference

Internal Workflow

  1. Parse command-line arguments to determine model, loop count, backend, and thread count.
  2. Configure ScheduleConfig with the selected backend type and thread count.
  3. For each selected model, construct the network graph using MNN's expression API (VARP).
  4. Call runNet() which creates a session, performs warmup iterations, then executes the timed inference loop.
  5. Collect per-iteration microsecond timestamps via getTimeInUs().
  6. Call displayStats() to compute and print min/max/avg latency.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment