Implementation:Alibaba MNN CMake Build LLM
Appearance
| Field | Value |
|---|---|
| implementation_name | CMake_Build_LLM |
| implementation_type | API Doc |
| repository | Alibaba_MNN |
| workflow | LLM_Deployment_Pipeline |
| pipeline_stage | Engine Compilation |
| source_file | CMakeLists.txt (L73), build_lib.sh (L337-376) |
| last_updated | 2026-02-10 14:00 GMT |
Summary
This implementation documents the CMake-based build process for compiling the MNN inference engine with LLM support. The primary CMake option MNN_BUILD_LLM (defined at line 73 of CMakeLists.txt) enables the LLM library, and additional flags control hardware backends and transformer-specific optimizations. The build_lib.sh script provides reference configurations for Android cross-compilation.
API Signature
mkdir build && cd build
cmake .. -DMNN_BUILD_LLM=true [options]
make -j16
Source Reference
CMakeLists.txt (Lines 60-83)
option(MNN_SUPPORT_TRANSFORMER_FUSE "Enable MNN transformer Fuse Ops" OFF)
option(MNN_SEP_BUILD "Build MNN Backends and expression separately." ON)
option(MNN_BUILD_LLM "Build llm library based MNN." OFF)
option(MNN_BUILD_LLM_OMNI "If build llm library, build it with omni (support image / audio)" OFF)
option(MNN_BUILD_DIFFUSION "Build diffusion demo based MNN." OFF)
option(MNN_SUPPORT_BF16 "Enable MNN's bf16 op" OFF)
option(MNN_LOW_MEMORY "Build MNN support low memory for weight quant model." OFF)
option(MNN_CPU_WEIGHT_DEQUANT_GEMM "Build MNN CPU weight dequant related gemm kernels." OFF)
option(MNN_SME2 "Use Arm sme2 instructions" ON)
option(MNN_METAL_TENSOR "Use Metal4 tensor instructions" ON)
build_lib.sh (Lines 337-378, Android arm64-v8a reference)
cmake ../../../ \
-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
-DCMAKE_BUILD_TYPE=Release \
-DANDROID_ABI="arm64-v8a" \
-DANDROID_STL=c++_static \
-DANDROID_NATIVE_API_LEVEL=android-21 \
-DANDROID_TOOLCHAIN=clang \
-DMNN_USE_LOGCAT=false \
-DMNN_BUILD_BENCHMARK=ON \
-DMNN_USE_SSE=OFF \
-DMNN_BUILD_TEST=ON \
-DMNN_ARM82=ON \
-DMNN_LOW_MEMORY=ON \
-DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
-DMNN_BUILD_LLM=ON \
-DMNN_CPU_WEIGHT_DEQUANT_GEMM=ON \
-DMNN_BUILD_DIFFUSION=ON \
-DMNN_OPENCL=OFF \
-DMNN_SEP_BUILD=OFF \
-DLLM_SUPPORT_AUDIO=ON \
-DMNN_BUILD_AUDIO=ON \
-DLLM_SUPPORT_VISION=ON \
-DMNN_BUILD_OPENCV=ON \
-DMNN_IMGCODECS=ON \
-DMNN_BUILD_FOR_ANDROID_COMMAND=true
Key Parameters
| CMake Option | Default | Description |
|---|---|---|
-DMNN_BUILD_LLM |
OFF | Required. Enables the LLM inference library build (llm_demo, llm_bench, libllm) |
-DMNN_BUILD_LLM_OMNI |
OFF | Enables multimodal (image/audio) support for Omni models |
-DMNN_LOW_MEMORY |
OFF | Enables runtime weight dequantization for low-memory inference of quantized models |
-DMNN_CPU_WEIGHT_DEQUANT_GEMM |
OFF | Enables fused weight-dequantization GEMM kernels for improved performance |
-DMNN_SUPPORT_TRANSFORMER_FUSE |
OFF | Enables fused transformer operations (fused attention kernels) |
-DMNN_ARM82 |
OFF | Enables ARMv8.2 fp16 instructions for ARM targets |
-DMNN_SME2 |
ON | Enables ARM Scalable Matrix Extension 2 instructions |
-DMNN_OPENCL |
OFF | Enables OpenCL GPU backend (Android GPU acceleration) |
-DMNN_METAL |
OFF | Enables Metal GPU backend (iOS/macOS GPU acceleration) |
-DMNN_METAL_TENSOR |
ON | Enables Metal4 tensor instructions for Apple GPUs |
-DMNN_AVX512 |
OFF | Enables AVX512 SIMD instructions for x86 platforms |
-DMNN_USE_SSE |
ON | Enables SSE optimization for x86 (disable for ARM targets) |
-DMNN_BUILD_SHARED_LIBS |
ON | Build shared libraries (.so/.dylib) vs static (.a)
|
-DMNN_SEP_BUILD |
ON | Build backends and expression modules separately |
-DMNN_FORBID_MULTI_THREAD |
OFF | Disable multi-threading (required for WASM builds) |
-DLLM_SUPPORT_AUDIO |
OFF | Enable audio input support in LLM library |
-DLLM_SUPPORT_VISION |
OFF | Enable vision input support in LLM library |
-DMNN_BUILD_AUDIO |
OFF | Build MNN audio processing library |
-DMNN_BUILD_OPENCV |
OFF | Build MNN OpenCV-compatible API (needed for vision models) |
Inputs
- MNN source tree (the cloned repository)
- CMake 3.6 or later
- C/C++ compiler (Clang or GCC; MSVC on Windows)
- Platform-specific toolchain (Android NDK for Android, Xcode for iOS)
Outputs
| Artifact | Description |
|---|---|
llm_demo |
Interactive LLM inference CLI tool |
llm_bench |
LLM benchmarking tool for performance measurement |
libllm.so / libllm.a |
LLM inference library for integration into applications |
libMNN.so / libMNN.a |
Core MNN inference engine library |
libMNN_Express.so |
MNN expression API library (if MNN_SEP_BUILD=ON)
|
Usage Examples
Linux/macOS Desktop Build
mkdir build && cd build
cmake .. \
-DMNN_BUILD_LLM=true \
-DMNN_LOW_MEMORY=ON \
-DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
-DMNN_CPU_WEIGHT_DEQUANT_GEMM=ON \
-DMNN_AVX512=true \
-DCMAKE_BUILD_TYPE=Release
make -j16
Android arm64-v8a Cross-Compilation
mkdir build_android && cd build_android
cmake .. \
-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
-DCMAKE_BUILD_TYPE=Release \
-DANDROID_ABI="arm64-v8a" \
-DANDROID_STL=c++_static \
-DANDROID_NATIVE_API_LEVEL=android-21 \
-DANDROID_TOOLCHAIN=clang \
-DMNN_BUILD_LLM=ON \
-DMNN_ARM82=ON \
-DMNN_LOW_MEMORY=ON \
-DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
-DMNN_CPU_WEIGHT_DEQUANT_GEMM=ON \
-DMNN_SEP_BUILD=OFF \
-DMNN_USE_SSE=OFF
make -j16
iOS Build with Metal GPU Support
mkdir build_ios && cd build_ios
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_SYSTEM_NAME=iOS \
-DMNN_BUILD_LLM=ON \
-DMNN_METAL=ON \
-DMNN_LOW_MEMORY=ON \
-DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
-DMNN_AAPL_FMWK=ON
make -j16
Minimal CPU-Only Build
mkdir build && cd build
cmake .. \
-DMNN_BUILD_LLM=ON \
-DMNN_LOW_MEMORY=ON \
-DCMAKE_BUILD_TYPE=Release
make -j16
Notes
- The MNN project version is extracted from
include/MNN/MNNDefine.hat CMake configure time. - The C++ standard is set to C++11 by default unless
CMAKE_CXX_STANDARDis explicitly set to 17. - For OpenCL backend: the first run performs kernel tuning and generates a cache file. Performance measurements should use the second run.
- For the Android reference build in
build_lib.sh, both armeabi-v7a (32-bit, without ARM82) and arm64-v8a (64-bit, with ARM82) configurations are provided.
Related Pages
- Principle:Alibaba_MNN_LLM_Engine_Compilation
- Environment:Alibaba_MNN_CPU_Build_Environment
- Environment:Alibaba_MNN_GPU_CUDA_Environment
- Implementation:Alibaba_MNN_Llmexport_Script - Previous step: exporting the model
- Implementation:Alibaba_MNN_LLM_Config_JSON - Next step: configuring runtime parameters
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment