Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Alibaba MNN GPU CUDA Environment

From Leeroopedia


Field Value
environment_name GPU_CUDA_Environment
environment_type GPU Acceleration
repository Alibaba_MNN
platform Linux (primary), Windows (limited)
source_file CMakeLists.txt (L267-280), include/MNN/MNNForwardType.h (L28)
last_updated 2026-02-10 14:00 GMT

Overview

NVIDIA CUDA GPU acceleration environment for MNN. This environment enables high-performance inference on NVIDIA GPUs using CUDA compute kernels, with optional TensorRT integration for further optimized execution. Requires a CUDA-capable NVIDIA GPU and the NVIDIA CUDA toolkit installed on a Linux host.

Description

The CUDA backend (MNN_FORWARD_CUDA) offloads neural network operations to NVIDIA GPUs via CUDA. MNN compiles dedicated CUDA kernels for supported operations (convolution, matrix multiplication, element-wise ops, etc.) and falls back to CPU for unsupported ops. When combined with MNN_TENSORRT=ON, MNN can additionally leverage NVIDIA TensorRT for graph-level optimizations including layer fusion, kernel auto-tuning, and precision calibration.

CUDA profiling (MNN_CUDA_PROFILE) is forcibly disabled on non-Linux platforms, making Linux the only fully supported CUDA environment.

Usage

Use this environment when deploying MNN models on Linux or Windows workstations equipped with NVIDIA GPUs. Typical use cases include server-side LLM inference, diffusion model generation, and batch processing workloads that benefit from GPU parallelism.

System Requirements

  • Operating System: Linux (required for full support including CUDA profiling); Windows supported with restrictions
  • GPU: NVIDIA GPU with CUDA Compute Capability 3.5 or higher (recommended: Compute Capability 7.0+ for Tensor Core acceleration)
  • CUDA Toolkit: NVIDIA CUDA Toolkit (version compatible with the installed GPU driver)
  • Driver: NVIDIA GPU driver compatible with the installed CUDA Toolkit version
  • Compiler: nvcc (ships with CUDA Toolkit), plus a host C/C++ compiler (GCC on Linux, MSVC on Windows)
  • CMake: Version 3.6 or later

Dependencies

Dependency Required Notes
NVIDIA CUDA Toolkit Yes Provides nvcc compiler, CUDA runtime libraries, and headers
CMake Yes Must be able to locate CUDA via FindCUDA or native CUDA language support
NVIDIA GPU Driver Yes Must match the CUDA Toolkit version (check NVIDIA compatibility matrix)
TensorRT No Optional; enable with -DMNN_TENSORRT=ON for graph-level optimizations
cuDNN No May be required by TensorRT depending on the TensorRT version

Credentials

No credentials, API keys, or tokens are required for this environment. All software is locally installed.

Quick Install

# 1. Install NVIDIA CUDA Toolkit (Linux, e.g., Ubuntu)
#    Follow: https://developer.nvidia.com/cuda-downloads

# 2. Verify CUDA installation
nvcc --version
nvidia-smi

# 3. Clone and build MNN with CUDA enabled
git clone https://github.com/alibaba/MNN.git
cd MNN
mkdir build && cd build

# Basic CUDA build
cmake .. \
    -DMNN_CUDA=ON \
    -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

# CUDA + TensorRT build
cmake .. \
    -DMNN_CUDA=ON \
    -DMNN_TENSORRT=ON \
    -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

# CUDA + LLM build with profiling
cmake .. \
    -DMNN_CUDA=ON \
    -DMNN_CUDA_PROFILE=ON \
    -DMNN_BUILD_LLM=ON \
    -DMNN_LOW_MEMORY=ON \
    -DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
    -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

Code Evidence

CMakeLists.txt (Lines 267-280): CUDA and TensorRT options

option(MNN_CUDA "Enable CUDA" OFF)
option(MNN_TENSORRT "Enable TensorRT" OFF)
...
option(MNN_CUDA_PROFILE "Enable CUDA profile" OFF)

if (NOT MNN_CUDA OR NOT CMAKE_SYSTEM_NAME MATCHES "^Linux")
  set(MNN_CUDA_PROFILE OFF)
endif()

This shows that MNN_CUDA and MNN_TENSORRT are both OFF by default and must be explicitly enabled. The MNN_CUDA_PROFILE option is forcibly set to OFF unless the build target is Linux and CUDA is enabled.

CMakeLists.txt (Lines 741-747): CUDA backend linkage

IF(MNN_CUDA)
  add_subdirectory(${CMAKE_CURRENT_LIST_DIR}/source/backend/cuda)
  list(APPEND MNN_TARGETS MNN_CUDA)
  ...
  list(APPEND MNN_EXTRA_DEPENDS ${MNN_CUDA_LIBS})

When CUDA is enabled, the CUDA backend sources are compiled and linked into MNN.

CMakeLists.txt (Line 352): CUDA + Transformer Fuse requires C++17

if((MNN_CUDA AND MNN_SUPPORT_TRANSFORMER_FUSE) OR (CMAKE_CXX_STANDARD EQUAL 17))

When both CUDA and transformer fuse ops are enabled, C++17 standard is required.

MNNForwardType.h (Line 28): Forward type enum

/*NVIDIA GPU API*/
MNN_FORWARD_CUDA = 2,

The CUDA backend is identified by the forward type constant MNN_FORWARD_CUDA = 2.

Common Errors

Error Cause Resolution
CUDA not found or No CUDA toolkits found CUDA Toolkit is not installed or not on the system PATH Install the NVIDIA CUDA Toolkit and ensure nvcc is in PATH; set CUDA_TOOLKIT_ROOT_DIR if needed
CUDA out of memory GPU VRAM insufficient for the model and intermediate tensors Reduce batch size, use lower precision (Precision_Low), use MNN_LOW_MEMORY=ON, or use a GPU with more VRAM
CUDA driver version is insufficient for CUDA runtime version Mismatch between GPU driver and CUDA Toolkit version Update the NVIDIA GPU driver to a version compatible with the installed CUDA Toolkit
MNN_CUDA_PROFILE has no effect Building on non-Linux platform CUDA profiling is only supported on Linux; the CMake script forcibly disables it on other platforms (CMakeLists.txt:278-280)
Linker errors with CUDA symbols on Windows Incorrect MSVC runtime configuration Set -DMNN_WIN_RUNTIME_MT=ON and -DMNN_BUILD_SHARED_LIBS=ON when building on Windows
TensorRT initialization failure TensorRT libraries not found or version mismatch Ensure TensorRT is installed and its library path is included in LD_LIBRARY_PATH

Compatibility Notes

  • Linux: Full support including CUDA profiling via MNN_CUDA_PROFILE=ON. This is the primary and recommended platform for CUDA acceleration.
  • Windows: CUDA backend compiles but CUDA profiling is forcibly disabled. Requires MNN_WIN_RUNTIME_MT=ON for static MSVC runtime linking and MNN_BUILD_SHARED_LIBS=ON for proper DLL export/import.
  • macOS: Not supported. NVIDIA has discontinued CUDA support on macOS.
  • C++17 requirement: When both MNN_CUDA=ON and MNN_SUPPORT_TRANSFORMER_FUSE=ON are set, the build requires C++17 standard.
  • TensorRT: Optional add-on; can be enabled alongside CUDA for graph optimization. Requires separate TensorRT installation from NVIDIA.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment