Environment:Alibaba MNN GPU CUDA Environment

Field	Value
environment_name	GPU_CUDA_Environment
environment_type	GPU Acceleration
repository	Alibaba_MNN
platform	Linux (primary), Windows (limited)
source_file	CMakeLists.txt (L267-280), include/MNN/MNNForwardType.h (L28)
last_updated	2026-02-10 14:00 GMT

Overview

NVIDIA CUDA GPU acceleration environment for MNN. This environment enables high-performance inference on NVIDIA GPUs using CUDA compute kernels, with optional TensorRT integration for further optimized execution. Requires a CUDA-capable NVIDIA GPU and the NVIDIA CUDA toolkit installed on a Linux host.

Description

The CUDA backend (MNN_FORWARD_CUDA) offloads neural network operations to NVIDIA GPUs via CUDA. MNN compiles dedicated CUDA kernels for supported operations (convolution, matrix multiplication, element-wise ops, etc.) and falls back to CPU for unsupported ops. When combined with MNN_TENSORRT=ON, MNN can additionally leverage NVIDIA TensorRT for graph-level optimizations including layer fusion, kernel auto-tuning, and precision calibration.

CUDA profiling (MNN_CUDA_PROFILE) is forcibly disabled on non-Linux platforms, making Linux the only fully supported CUDA environment.

Usage

Use this environment when deploying MNN models on Linux or Windows workstations equipped with NVIDIA GPUs. Typical use cases include server-side LLM inference, diffusion model generation, and batch processing workloads that benefit from GPU parallelism.

System Requirements

Operating System: Linux (required for full support including CUDA profiling); Windows supported with restrictions
GPU: NVIDIA GPU with CUDA Compute Capability 3.5 or higher (recommended: Compute Capability 7.0+ for Tensor Core acceleration)
CUDA Toolkit: NVIDIA CUDA Toolkit (version compatible with the installed GPU driver)
Driver: NVIDIA GPU driver compatible with the installed CUDA Toolkit version
Compiler: nvcc (ships with CUDA Toolkit), plus a host C/C++ compiler (GCC on Linux, MSVC on Windows)
CMake: Version 3.6 or later

Dependencies

Dependency	Required	Notes
NVIDIA CUDA Toolkit	Yes	Provides `nvcc` compiler, CUDA runtime libraries, and headers
CMake	Yes	Must be able to locate CUDA via `FindCUDA` or native CUDA language support
NVIDIA GPU Driver	Yes	Must match the CUDA Toolkit version (check NVIDIA compatibility matrix)
TensorRT	No	Optional; enable with `-DMNN_TENSORRT=ON` for graph-level optimizations
cuDNN	No	May be required by TensorRT depending on the TensorRT version

Credentials

No credentials, API keys, or tokens are required for this environment. All software is locally installed.

Quick Install

# 1. Install NVIDIA CUDA Toolkit (Linux, e.g., Ubuntu)
#    Follow: https://developer.nvidia.com/cuda-downloads

# 2. Verify CUDA installation
nvcc --version
nvidia-smi

# 3. Clone and build MNN with CUDA enabled
git clone https://github.com/alibaba/MNN.git
cd MNN
mkdir build && cd build

# Basic CUDA build
cmake .. \
    -DMNN_CUDA=ON \
    -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

# CUDA + TensorRT build
cmake .. \
    -DMNN_CUDA=ON \
    -DMNN_TENSORRT=ON \
    -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

# CUDA + LLM build with profiling
cmake .. \
    -DMNN_CUDA=ON \
    -DMNN_CUDA_PROFILE=ON \
    -DMNN_BUILD_LLM=ON \
    -DMNN_LOW_MEMORY=ON \
    -DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
    -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

Code Evidence

CMakeLists.txt (Lines 267-280): CUDA and TensorRT options

option(MNN_CUDA "Enable CUDA" OFF)
option(MNN_TENSORRT "Enable TensorRT" OFF)
...
option(MNN_CUDA_PROFILE "Enable CUDA profile" OFF)

if (NOT MNN_CUDA OR NOT CMAKE_SYSTEM_NAME MATCHES "^Linux")
  set(MNN_CUDA_PROFILE OFF)
endif()

This shows that MNN_CUDA and MNN_TENSORRT are both OFF by default and must be explicitly enabled. The MNN_CUDA_PROFILE option is forcibly set to OFF unless the build target is Linux and CUDA is enabled.

CMakeLists.txt (Lines 741-747): CUDA backend linkage

IF(MNN_CUDA)
  add_subdirectory(${CMAKE_CURRENT_LIST_DIR}/source/backend/cuda)
  list(APPEND MNN_TARGETS MNN_CUDA)
  ...
  list(APPEND MNN_EXTRA_DEPENDS ${MNN_CUDA_LIBS})

When CUDA is enabled, the CUDA backend sources are compiled and linked into MNN.

CMakeLists.txt (Line 352): CUDA + Transformer Fuse requires C++17

if((MNN_CUDA AND MNN_SUPPORT_TRANSFORMER_FUSE) OR (CMAKE_CXX_STANDARD EQUAL 17))

When both CUDA and transformer fuse ops are enabled, C++17 standard is required.

MNNForwardType.h (Line 28): Forward type enum

/*NVIDIA GPU API*/
MNN_FORWARD_CUDA = 2,

The CUDA backend is identified by the forward type constant MNN_FORWARD_CUDA = 2.

Common Errors

Error	Cause	Resolution
`CUDA not found` or `No CUDA toolkits found`	CUDA Toolkit is not installed or not on the system PATH	Install the NVIDIA CUDA Toolkit and ensure `nvcc` is in PATH; set `CUDA_TOOLKIT_ROOT_DIR` if needed
`CUDA out of memory`	GPU VRAM insufficient for the model and intermediate tensors	Reduce batch size, use lower precision (`Precision_Low`), use `MNN_LOW_MEMORY=ON`, or use a GPU with more VRAM
`CUDA driver version is insufficient for CUDA runtime version`	Mismatch between GPU driver and CUDA Toolkit version	Update the NVIDIA GPU driver to a version compatible with the installed CUDA Toolkit
`MNN_CUDA_PROFILE has no effect`	Building on non-Linux platform	CUDA profiling is only supported on Linux; the CMake script forcibly disables it on other platforms (CMakeLists.txt:278-280)
Linker errors with CUDA symbols on Windows	Incorrect MSVC runtime configuration	Set `-DMNN_WIN_RUNTIME_MT=ON` and `-DMNN_BUILD_SHARED_LIBS=ON` when building on Windows
TensorRT initialization failure	TensorRT libraries not found or version mismatch	Ensure TensorRT is installed and its library path is included in `LD_LIBRARY_PATH`

Compatibility Notes

Linux: Full support including CUDA profiling via MNN_CUDA_PROFILE=ON. This is the primary and recommended platform for CUDA acceleration.
Windows: CUDA backend compiles but CUDA profiling is forcibly disabled. Requires MNN_WIN_RUNTIME_MT=ON for static MSVC runtime linking and MNN_BUILD_SHARED_LIBS=ON for proper DLL export/import.
macOS: Not supported. NVIDIA has discontinued CUDA support on macOS.
C++17 requirement: When both MNN_CUDA=ON and MNN_SUPPORT_TRANSFORMER_FUSE=ON are set, the build requires C++17 standard.
TensorRT: Optional add-on; can be enabled alongside CUDA for graph optimization. Requires separate TensorRT installation from NVIDIA.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment