Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba MNN CMake Build LLM

From Leeroopedia


Field Value
implementation_name CMake_Build_LLM
implementation_type API Doc
repository Alibaba_MNN
workflow LLM_Deployment_Pipeline
pipeline_stage Engine Compilation
source_file CMakeLists.txt (L73), build_lib.sh (L337-376)
last_updated 2026-02-10 14:00 GMT

Summary

This implementation documents the CMake-based build process for compiling the MNN inference engine with LLM support. The primary CMake option MNN_BUILD_LLM (defined at line 73 of CMakeLists.txt) enables the LLM library, and additional flags control hardware backends and transformer-specific optimizations. The build_lib.sh script provides reference configurations for Android cross-compilation.

API Signature

mkdir build && cd build
cmake .. -DMNN_BUILD_LLM=true [options]
make -j16

Source Reference

CMakeLists.txt (Lines 60-83)

option(MNN_SUPPORT_TRANSFORMER_FUSE "Enable MNN transformer Fuse Ops" OFF)
option(MNN_SEP_BUILD "Build MNN Backends and expression separately." ON)
option(MNN_BUILD_LLM "Build llm library based MNN." OFF)
option(MNN_BUILD_LLM_OMNI "If build llm library, build it with omni (support image / audio)" OFF)
option(MNN_BUILD_DIFFUSION "Build diffusion demo based MNN." OFF)
option(MNN_SUPPORT_BF16 "Enable MNN's bf16 op" OFF)
option(MNN_LOW_MEMORY "Build MNN support low memory for weight quant model." OFF)
option(MNN_CPU_WEIGHT_DEQUANT_GEMM "Build MNN CPU weight dequant related gemm kernels." OFF)
option(MNN_SME2 "Use Arm sme2 instructions" ON)
option(MNN_METAL_TENSOR "Use Metal4 tensor instructions" ON)

build_lib.sh (Lines 337-378, Android arm64-v8a reference)

cmake ../../../ \
    -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DANDROID_ABI="arm64-v8a" \
    -DANDROID_STL=c++_static \
    -DANDROID_NATIVE_API_LEVEL=android-21 \
    -DANDROID_TOOLCHAIN=clang \
    -DMNN_USE_LOGCAT=false \
    -DMNN_BUILD_BENCHMARK=ON \
    -DMNN_USE_SSE=OFF \
    -DMNN_BUILD_TEST=ON \
    -DMNN_ARM82=ON \
    -DMNN_LOW_MEMORY=ON \
    -DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
    -DMNN_BUILD_LLM=ON \
    -DMNN_CPU_WEIGHT_DEQUANT_GEMM=ON \
    -DMNN_BUILD_DIFFUSION=ON \
    -DMNN_OPENCL=OFF \
    -DMNN_SEP_BUILD=OFF \
    -DLLM_SUPPORT_AUDIO=ON \
    -DMNN_BUILD_AUDIO=ON \
    -DLLM_SUPPORT_VISION=ON \
    -DMNN_BUILD_OPENCV=ON \
    -DMNN_IMGCODECS=ON \
    -DMNN_BUILD_FOR_ANDROID_COMMAND=true

Key Parameters

CMake Option Default Description
-DMNN_BUILD_LLM OFF Required. Enables the LLM inference library build (llm_demo, llm_bench, libllm)
-DMNN_BUILD_LLM_OMNI OFF Enables multimodal (image/audio) support for Omni models
-DMNN_LOW_MEMORY OFF Enables runtime weight dequantization for low-memory inference of quantized models
-DMNN_CPU_WEIGHT_DEQUANT_GEMM OFF Enables fused weight-dequantization GEMM kernels for improved performance
-DMNN_SUPPORT_TRANSFORMER_FUSE OFF Enables fused transformer operations (fused attention kernels)
-DMNN_ARM82 OFF Enables ARMv8.2 fp16 instructions for ARM targets
-DMNN_SME2 ON Enables ARM Scalable Matrix Extension 2 instructions
-DMNN_OPENCL OFF Enables OpenCL GPU backend (Android GPU acceleration)
-DMNN_METAL OFF Enables Metal GPU backend (iOS/macOS GPU acceleration)
-DMNN_METAL_TENSOR ON Enables Metal4 tensor instructions for Apple GPUs
-DMNN_AVX512 OFF Enables AVX512 SIMD instructions for x86 platforms
-DMNN_USE_SSE ON Enables SSE optimization for x86 (disable for ARM targets)
-DMNN_BUILD_SHARED_LIBS ON Build shared libraries (.so/.dylib) vs static (.a)
-DMNN_SEP_BUILD ON Build backends and expression modules separately
-DMNN_FORBID_MULTI_THREAD OFF Disable multi-threading (required for WASM builds)
-DLLM_SUPPORT_AUDIO OFF Enable audio input support in LLM library
-DLLM_SUPPORT_VISION OFF Enable vision input support in LLM library
-DMNN_BUILD_AUDIO OFF Build MNN audio processing library
-DMNN_BUILD_OPENCV OFF Build MNN OpenCV-compatible API (needed for vision models)

Inputs

  • MNN source tree (the cloned repository)
  • CMake 3.6 or later
  • C/C++ compiler (Clang or GCC; MSVC on Windows)
  • Platform-specific toolchain (Android NDK for Android, Xcode for iOS)

Outputs

Artifact Description
llm_demo Interactive LLM inference CLI tool
llm_bench LLM benchmarking tool for performance measurement
libllm.so / libllm.a LLM inference library for integration into applications
libMNN.so / libMNN.a Core MNN inference engine library
libMNN_Express.so MNN expression API library (if MNN_SEP_BUILD=ON)

Usage Examples

Linux/macOS Desktop Build

mkdir build && cd build
cmake .. \
    -DMNN_BUILD_LLM=true \
    -DMNN_LOW_MEMORY=ON \
    -DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
    -DMNN_CPU_WEIGHT_DEQUANT_GEMM=ON \
    -DMNN_AVX512=true \
    -DCMAKE_BUILD_TYPE=Release
make -j16

Android arm64-v8a Cross-Compilation

mkdir build_android && cd build_android
cmake .. \
    -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DCMAKE_BUILD_TYPE=Release \
    -DANDROID_ABI="arm64-v8a" \
    -DANDROID_STL=c++_static \
    -DANDROID_NATIVE_API_LEVEL=android-21 \
    -DANDROID_TOOLCHAIN=clang \
    -DMNN_BUILD_LLM=ON \
    -DMNN_ARM82=ON \
    -DMNN_LOW_MEMORY=ON \
    -DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
    -DMNN_CPU_WEIGHT_DEQUANT_GEMM=ON \
    -DMNN_SEP_BUILD=OFF \
    -DMNN_USE_SSE=OFF
make -j16

iOS Build with Metal GPU Support

mkdir build_ios && cd build_ios
cmake .. \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_SYSTEM_NAME=iOS \
    -DMNN_BUILD_LLM=ON \
    -DMNN_METAL=ON \
    -DMNN_LOW_MEMORY=ON \
    -DMNN_SUPPORT_TRANSFORMER_FUSE=ON \
    -DMNN_AAPL_FMWK=ON
make -j16

Minimal CPU-Only Build

mkdir build && cd build
cmake .. \
    -DMNN_BUILD_LLM=ON \
    -DMNN_LOW_MEMORY=ON \
    -DCMAKE_BUILD_TYPE=Release
make -j16

Notes

  • The MNN project version is extracted from include/MNN/MNNDefine.h at CMake configure time.
  • The C++ standard is set to C++11 by default unless CMAKE_CXX_STANDARD is explicitly set to 17.
  • For OpenCL backend: the first run performs kernel tuning and generates a cache file. Performance measurements should use the second run.
  • For the Android reference build in build_lib.sh, both armeabi-v7a (32-bit, without ARM82) and arm64-v8a (64-bit, with ARM82) configurations are provided.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment