Implementation:Alibaba MNN BenchmarkExprModels

Property	Value
Page Type	Implementation
Repository	Alibaba MNN
Source File	`benchmark/benchmarkExprModels.cpp` (232 lines)
Language	C++
Domains	Benchmarking, Inference
Date	2026-02-10

Overview

BenchmarkExprModels is a benchmark tool that measures inference latency for expression-based neural network models across multiple hardware backends. It supports benchmarking well-known architectures including MobileNet, ResNet, GoogLeNet, SqueezeNet, and ShuffleNet using MNN's expression API (VARP-based model construction).

The tool runs each model through a configurable number of inference loops, collecting per-iteration timing data to compute minimum, maximum, and average latency statistics. It supports selection of different compute backends (CPU, Vulkan, OpenCL, Metal) and thread count configuration, making it suitable for cross-platform performance evaluation and regression testing.

Code Reference

Key Functions

Function	Location	Description
`main()`	`benchmark/benchmarkExprModels.cpp:L149+`	Entry point that parses CLI arguments, configures the backend and scheduling, iterates over model architectures, and reports results.
`runNet(VARP, ScheduleConfig&, int)`	`benchmark/benchmarkExprModels.cpp:L84-126`	Executes a single model benchmark: creates a session from the expression variable, runs the specified number of inference loops, and records per-iteration timing.
`displayStats()`	`benchmark/benchmarkExprModels.cpp:L73-82`	Formats and prints the collected timing statistics (min, max, average) for a completed benchmark run.
`getTimeInUs()`	`benchmark/benchmarkExprModels.cpp:L28-43`	Platform-abstracted microsecond timer using `gettimeofday` or equivalent, providing the timing primitive for latency measurements.

Function Signatures

// benchmark/benchmarkExprModels.cpp:L84
static void runNet(VARP input, ScheduleConfig& config, int loopCount);

// benchmark/benchmarkExprModels.cpp:L73
static void displayStats(const std::string& name,
                          const std::vector<float>& times);

// benchmark/benchmarkExprModels.cpp:L28
static inline uint64_t getTimeInUs();

I/O Contract

Inputs

Parameter	Type	Default	Description
`model`	string (CLI arg 1)	(required)	Name of the model to benchmark (e.g., `MobileNet`, `ResNet`, `all`).
`loop`	int (CLI arg 2)	10	Number of inference iterations for timing.
`forward_type`	int (CLI arg 3)	0 (CPU)	Backend selection: 0 = CPU, 3 = OpenCL, 7 = Vulkan, 1 = Metal.
`threads`	int (CLI arg 4)	4	Number of threads for CPU backend execution.

Outputs

Output	Format	Description
Latency statistics	stdout text	Per-model timing report showing min, max, and avg inference time in milliseconds.

Example Output

MobileNetV2:
  Min:  12.34 ms
  Max:  15.67 ms
  Avg:  13.21 ms
ResNet18:
  Min:  28.91 ms
  Max:  31.44 ms
  Avg:  29.87 ms

Usage Examples

Benchmark All Models on CPU

./benchmarkExprModels all 100 0 4

Runs all supported models for 100 iterations each on CPU with 4 threads.

Benchmark MobileNet on Vulkan

./benchmarkExprModels MobileNet 50 7 1

Runs MobileNet for 50 iterations on the Vulkan backend.

Benchmark on OpenCL

./benchmarkExprModels ResNet 100 3 1

Runs ResNet for 100 iterations on the OpenCL backend.

Supported Models

Model	Architecture	Typical Use Case
MobileNetV1/V2	Depthwise separable convolutions	Mobile/edge inference
ResNet-18/50	Residual connections	General-purpose classification
GoogLeNet	Inception modules	Image classification
SqueezeNet	Fire modules	Compact models
ShuffleNet	Channel shuffle operations	Efficient mobile inference

Internal Workflow

Parse command-line arguments to determine model, loop count, backend, and thread count.
Configure ScheduleConfig with the selected backend type and thread count.
For each selected model, construct the network graph using MNN's expression API (VARP).
Call runNet() which creates a session, performs warmup iterations, then executes the timed inference loop.
Collect per-iteration microsecond timestamps via getTimeInUs().
Call displayStats() to compute and print min/max/avg latency.

Related Pages

Principle: Alibaba_MNN_Neural_Network_Inference — Foundational principle covering MNN's inference execution model, session management, and backend scheduling.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment