Heuristic:Alibaba MNN GPU Tuning Modes
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Inference, GPU |
| Last Updated | 2026-02-10 12:00 GMT |
Overview
Guide for selecting OpenCL/Vulkan GPU tuning modes and memory configurations to optimize MNN inference on GPU.
Description
MNN provides five GPU tuning levels that control how aggressively the runtime searches for optimal kernel configurations. Additionally, OpenCL supports buffer vs image memory modes that can differ significantly in performance depending on the GPU vendor. Qualcomm GPUs also support kernel recording for batched dispatch.
Usage
Use when first deploying a model on a new GPU to find optimal settings, or when GPU inference is slower than expected.
The Insight (Rule of Thumb)
- Tuning Level: Start with MNN_GPU_TUNING_WIDE (default, good balance). Try MNN_GPU_TUNING_HEAVY for production deployment (slow init but optimal kernels).
- Memory Mode: Test both MNN_GPU_MEMORY_BUFFER and MNN_GPU_MEMORY_IMAGE on your target hardware. Performance varies by GPU vendor.
- Kernel Recording: MNN_GPU_RECORD_BATCH shares one commandBuffer for all ops (Vulkan). MNN_GPU_RECORD_OP records per-op (OpenCL, Qualcomm only).
- Cache: Save GPU tuning results via RuntimeManager cache to avoid re-tuning on subsequent runs (2x+ startup speedup).
- Trade-off: Heavy tuning dramatically increases first-run time but produces optimal kernel selection. Caching eliminates this on subsequent runs.
Reasoning
Different GPU architectures prefer different kernel configurations. Auto-tuning finds the best config empirically. Image memory mode uses GPU texture units for faster access on some GPUs, while buffer mode is more predictable.
Code Evidence
Tuning and memory mode enum definitions from `MNNForwardType.h:62-78`:
// GPU tuning levels - control kernel search intensity
MNN_GPU_TUNING_NONE = 1 << 0, // No tuning, use defaults
MNN_GPU_TUNING_FAST = 1 << 2, // Quick search, good for development
MNN_GPU_TUNING_NORMAL = 1 << 3, // Moderate search
MNN_GPU_TUNING_WIDE = 1 << 4, // Wide search (default, good balance)
MNN_GPU_TUNING_HEAVY = 1 << 5, // Exhaustive search, best for production
// GPU memory modes
MNN_GPU_MEMORY_BUFFER = 1 << 6, // Use buffer memory (OpenCL cl_mem buffer)
MNN_GPU_MEMORY_IMAGE = 1 << 7, // Use image memory (OpenCL cl_mem image2d)
// GPU kernel recording modes
MNN_GPU_RECORD_OP = 1 << 8, // Record per-op (OpenCL, Qualcomm only)
MNN_GPU_RECORD_BATCH = 1 << 9, // Record batch (Vulkan, shared commandBuffer)