Implementation:Alibaba MNN BenchmarkExprModels
| Property | Value |
|---|---|
| Page Type | Implementation |
| Repository | Alibaba MNN |
| Source File | benchmark/benchmarkExprModels.cpp (232 lines)
|
| Language | C++ |
| Domains | Benchmarking, Inference |
| Date | 2026-02-10 |
Overview
BenchmarkExprModels is a benchmark tool that measures inference latency for expression-based neural network models across multiple hardware backends. It supports benchmarking well-known architectures including MobileNet, ResNet, GoogLeNet, SqueezeNet, and ShuffleNet using MNN's expression API (VARP-based model construction).
The tool runs each model through a configurable number of inference loops, collecting per-iteration timing data to compute minimum, maximum, and average latency statistics. It supports selection of different compute backends (CPU, Vulkan, OpenCL, Metal) and thread count configuration, making it suitable for cross-platform performance evaluation and regression testing.
Code Reference
Key Functions
| Function | Location | Description |
|---|---|---|
main() |
benchmark/benchmarkExprModels.cpp:L149+ |
Entry point that parses CLI arguments, configures the backend and scheduling, iterates over model architectures, and reports results. |
runNet(VARP, ScheduleConfig&, int) |
benchmark/benchmarkExprModels.cpp:L84-126 |
Executes a single model benchmark: creates a session from the expression variable, runs the specified number of inference loops, and records per-iteration timing. |
displayStats() |
benchmark/benchmarkExprModels.cpp:L73-82 |
Formats and prints the collected timing statistics (min, max, average) for a completed benchmark run. |
getTimeInUs() |
benchmark/benchmarkExprModels.cpp:L28-43 |
Platform-abstracted microsecond timer using gettimeofday or equivalent, providing the timing primitive for latency measurements.
|
Function Signatures
// benchmark/benchmarkExprModels.cpp:L84
static void runNet(VARP input, ScheduleConfig& config, int loopCount);
// benchmark/benchmarkExprModels.cpp:L73
static void displayStats(const std::string& name,
const std::vector<float>& times);
// benchmark/benchmarkExprModels.cpp:L28
static inline uint64_t getTimeInUs();
I/O Contract
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
string (CLI arg 1) | (required) | Name of the model to benchmark (e.g., MobileNet, ResNet, all).
|
loop |
int (CLI arg 2) | 10 | Number of inference iterations for timing. |
forward_type |
int (CLI arg 3) | 0 (CPU) | Backend selection: 0 = CPU, 3 = OpenCL, 7 = Vulkan, 1 = Metal. |
threads |
int (CLI arg 4) | 4 | Number of threads for CPU backend execution. |
Outputs
| Output | Format | Description |
|---|---|---|
| Latency statistics | stdout text | Per-model timing report showing min, max, and avg inference time in milliseconds. |
Example Output
MobileNetV2:
Min: 12.34 ms
Max: 15.67 ms
Avg: 13.21 ms
ResNet18:
Min: 28.91 ms
Max: 31.44 ms
Avg: 29.87 ms
Usage Examples
Benchmark All Models on CPU
./benchmarkExprModels all 100 0 4
Runs all supported models for 100 iterations each on CPU with 4 threads.
Benchmark MobileNet on Vulkan
./benchmarkExprModels MobileNet 50 7 1
Runs MobileNet for 50 iterations on the Vulkan backend.
Benchmark on OpenCL
./benchmarkExprModels ResNet 100 3 1
Runs ResNet for 100 iterations on the OpenCL backend.
Supported Models
| Model | Architecture | Typical Use Case |
|---|---|---|
| MobileNetV1/V2 | Depthwise separable convolutions | Mobile/edge inference |
| ResNet-18/50 | Residual connections | General-purpose classification |
| GoogLeNet | Inception modules | Image classification |
| SqueezeNet | Fire modules | Compact models |
| ShuffleNet | Channel shuffle operations | Efficient mobile inference |
Internal Workflow
- Parse command-line arguments to determine model, loop count, backend, and thread count.
- Configure
ScheduleConfigwith the selected backend type and thread count. - For each selected model, construct the network graph using MNN's expression API (
VARP). - Call
runNet()which creates a session, performs warmup iterations, then executes the timed inference loop. - Collect per-iteration microsecond timestamps via
getTimeInUs(). - Call
displayStats()to compute and print min/max/avg latency.
Related Pages
- Principle: Alibaba_MNN_Neural_Network_Inference — Foundational principle covering MNN's inference execution model, session management, and backend scheduling.