Implementation:Alibaba MNN CMake Build LLM

Field	Value
implementation_name	CMake_Build_LLM
implementation_type	API Doc
repository	Alibaba_MNN
workflow	LLM_Deployment_Pipeline
pipeline_stage	Engine Compilation
source_file	CMakeLists.txt (L73), build_lib.sh (L337-376)
last_updated	2026-02-10 14:00 GMT

Summary

This implementation documents the CMake-based build process for compiling the MNN inference engine with LLM support. The primary CMake option MNN_BUILD_LLM (defined at line 73 of CMakeLists.txt) enables the LLM library, and additional flags control hardware backends and transformer-specific optimizations. The build_lib.sh script provides reference configurations for Android cross-compilation.

API Signature

mkdir build && cd build
cmake .. -DMNN_BUILD_LLM=true [options]
make -j16

Source Reference

CMakeLists.txt (Lines 60-83)

option(MNN_SUPPORT_TRANSFORMER_FUSE "Enable MNN transformer Fuse Ops" OFF)
option(MNN_SEP_BUILD "Build MNN Backends and expression separately." ON)
option(MNN_BUILD_LLM "Build llm library based MNN." OFF)
option(MNN_BUILD_LLM_OMNI "If build llm library, build it with omni (support image / audio)" OFF)
option(MNN_BUILD_DIFFUSION "Build diffusion demo based MNN." OFF)
option(MNN_SUPPORT_BF16 "Enable MNN's bf16 op" OFF)
option(MNN_LOW_MEMORY "Build MNN support low memory for weight quant model." OFF)
option(MNN_CPU_WEIGHT_DEQUANT_GEMM "Build MNN CPU weight dequant related gemm kernels." OFF)
option(MNN_SME2 "Use Arm sme2 instructions" ON)
option(MNN_METAL_TENSOR "Use Metal4 tensor instructions" ON)

build_lib.sh (Lines 337-378, Android arm64-v8a reference)

cmake ../../../ \
    -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DANDROID_ABI="arm64-v8a" \
    -DANDROID_STL=c++_static \
    -DANDROID_NATIVE_API_LEVEL=android-21 \
    -DANDROID_TOOLCHAIN=clang \
    -DMNN_USE_LOGCAT=false \
    -DMNN_BUILD_BENCHMARK=ON \
    -DMNN_USE_SSE=OFF \
    -DMNN_BUILD_TEST=ON \
    -DMNN_ARM82=ON \
    -DMNN_LOW_MEMORY=ON \
    -DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
    -DMNN_BUILD_LLM=ON \
    -DMNN_CPU_WEIGHT_DEQUANT_GEMM=ON \
    -DMNN_BUILD_DIFFUSION=ON \
    -DMNN_OPENCL=OFF \
    -DMNN_SEP_BUILD=OFF \
    -DLLM_SUPPORT_AUDIO=ON \
    -DMNN_BUILD_AUDIO=ON \
    -DLLM_SUPPORT_VISION=ON \
    -DMNN_BUILD_OPENCV=ON \
    -DMNN_IMGCODECS=ON \
    -DMNN_BUILD_FOR_ANDROID_COMMAND=true

Key Parameters

CMake Option	Default	Description
`-DMNN_BUILD_LLM`	OFF	Required. Enables the LLM inference library build (llm_demo, llm_bench, libllm)
`-DMNN_BUILD_LLM_OMNI`	OFF	Enables multimodal (image/audio) support for Omni models
`-DMNN_LOW_MEMORY`	OFF	Enables runtime weight dequantization for low-memory inference of quantized models
`-DMNN_CPU_WEIGHT_DEQUANT_GEMM`	OFF	Enables fused weight-dequantization GEMM kernels for improved performance
`-DMNN_SUPPORT_TRANSFORMER_FUSE`	OFF	Enables fused transformer operations (fused attention kernels)
`-DMNN_ARM82`	OFF	Enables ARMv8.2 fp16 instructions for ARM targets
`-DMNN_SME2`	ON	Enables ARM Scalable Matrix Extension 2 instructions
`-DMNN_OPENCL`	OFF	Enables OpenCL GPU backend (Android GPU acceleration)
`-DMNN_METAL`	OFF	Enables Metal GPU backend (iOS/macOS GPU acceleration)
`-DMNN_METAL_TENSOR`	ON	Enables Metal4 tensor instructions for Apple GPUs
`-DMNN_AVX512`	OFF	Enables AVX512 SIMD instructions for x86 platforms
`-DMNN_USE_SSE`	ON	Enables SSE optimization for x86 (disable for ARM targets)
`-DMNN_BUILD_SHARED_LIBS`	ON	Build shared libraries (`.so`/`.dylib`) vs static (`.a`)
`-DMNN_SEP_BUILD`	ON	Build backends and expression modules separately
`-DMNN_FORBID_MULTI_THREAD`	OFF	Disable multi-threading (required for WASM builds)
`-DLLM_SUPPORT_AUDIO`	OFF	Enable audio input support in LLM library
`-DLLM_SUPPORT_VISION`	OFF	Enable vision input support in LLM library
`-DMNN_BUILD_AUDIO`	OFF	Build MNN audio processing library
`-DMNN_BUILD_OPENCV`	OFF	Build MNN OpenCV-compatible API (needed for vision models)

Inputs

MNN source tree (the cloned repository)
CMake 3.6 or later
C/C++ compiler (Clang or GCC; MSVC on Windows)
Platform-specific toolchain (Android NDK for Android, Xcode for iOS)

Outputs

Artifact	Description
`llm_demo`	Interactive LLM inference CLI tool
`llm_bench`	LLM benchmarking tool for performance measurement
`libllm.so` / `libllm.a`	LLM inference library for integration into applications
`libMNN.so` / `libMNN.a`	Core MNN inference engine library
`libMNN_Express.so`	MNN expression API library (if `MNN_SEP_BUILD=ON`)

Usage Examples

Linux/macOS Desktop Build

mkdir build && cd build
cmake .. \
    -DMNN_BUILD_LLM=true \
    -DMNN_LOW_MEMORY=ON \
    -DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
    -DMNN_CPU_WEIGHT_DEQUANT_GEMM=ON \
    -DMNN_AVX512=true \
    -DCMAKE_BUILD_TYPE=Release
make -j16

Android arm64-v8a Cross-Compilation

mkdir build_android && cd build_android
cmake .. \
    -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DANDROID_ABI="arm64-v8a" \
    -DANDROID_STL=c++_static \
    -DANDROID_NATIVE_API_LEVEL=android-21 \
    -DANDROID_TOOLCHAIN=clang \
    -DMNN_BUILD_LLM=ON \
    -DMNN_ARM82=ON \
    -DMNN_LOW_MEMORY=ON \
    -DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
    -DMNN_CPU_WEIGHT_DEQUANT_GEMM=ON \
    -DMNN_SEP_BUILD=OFF \
    -DMNN_USE_SSE=OFF
make -j16

iOS Build with Metal GPU Support

mkdir build_ios && cd build_ios
cmake .. \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_SYSTEM_NAME=iOS \
    -DMNN_BUILD_LLM=ON \
    -DMNN_METAL=ON \
    -DMNN_LOW_MEMORY=ON \
    -DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
    -DMNN_AAPL_FMWK=ON
make -j16

Minimal CPU-Only Build

mkdir build && cd build
cmake .. \
    -DMNN_BUILD_LLM=ON \
    -DMNN_LOW_MEMORY=ON \
    -DCMAKE_BUILD_TYPE=Release
make -j16

Notes

The MNN project version is extracted from include/MNN/MNNDefine.h at CMake configure time.
The C++ standard is set to C++11 by default unless CMAKE_CXX_STANDARD is explicitly set to 17.
For OpenCL backend: the first run performs kernel tuning and generates a cache file. Performance measurements should use the second run.
For the Android reference build in build_lib.sh, both armeabi-v7a (32-bit, without ARM82) and arm64-v8a (64-bit, with ARM82) configurations are provided.

Related Pages

Principle:Alibaba_MNN_LLM_Engine_Compilation
Environment:Alibaba_MNN_CPU_Build_Environment
Environment:Alibaba_MNN_GPU_CUDA_Environment
Implementation:Alibaba_MNN_Llmexport_Script - Previous step: exporting the model
Implementation:Alibaba_MNN_LLM_Config_JSON - Next step: configuring runtime parameters

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment