Workflow:Alibaba MNN Model Conversion Pipeline
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Model_Deployment, Model_Conversion |
| Last Updated | 2026-02-10 08:00 GMT |
Overview
End-to-end process for converting deep learning models from external frameworks (ONNX, TensorFlow, Caffe, TorchScript, TFLite) into the MNN format for efficient on-device inference.
Description
This workflow covers the standard procedure for transforming models trained in popular frameworks into MNN's proprietary format (.mnn). The MNN model format uses FlatBuffers serialization for fast, zero-copy loading. The conversion pipeline handles graph optimization, operator mapping, optional weight quantization, and correctness verification. It supports models from TensorFlow (frozen .pb), TensorFlow Lite (.tflite), Caffe (.caffemodel), ONNX (.onnx), and TorchScript (.pt) formats.
Key outputs:
- A .mnn model file ready for on-device inference
- Optionally, a separated .mnn.weight file for reduced peak memory
- Optional weight quantization (2-8 bit) for model size reduction
Usage
Execute this workflow when you have a trained deep learning model in TensorFlow, PyTorch/ONNX, Caffe, or TFLite format and need to deploy it on mobile devices (iOS/Android), embedded systems, or edge hardware using the MNN inference engine. This is the prerequisite step before any MNN inference workflow.
Execution Steps
Step 1: Prepare the source model
Ensure the source model is in a supported format. For PyTorch models, export to either ONNX (using torch.onnx.export) or TorchScript (using torch.jit.trace or torch.jit.script). For TensorFlow models, ensure the model is a frozen graph (.pb), not a SavedModel directory. For models with dynamic input shapes, specify dynamic_axes during ONNX export.
Key considerations:
- PyTorch models must be exported via torch.jit or torch.onnx, not raw .pth weight files
- TorchScript export requires the model to be in eval mode
- If input dimensions vary at runtime, mark them as dynamic during ONNX export
Step 2: Install or compile MNNConvert
Obtain the MNNConvert tool either by installing the PyMNN Python package (pip install MNN, which provides the mnnconvert CLI) or by compiling from source with the CMake option -DMNN_BUILD_CONVERTER=ON. The Python package is recommended for quick experimentation; the compiled binary is preferred for production.
Key considerations:
- Python route: pip install MNN provides mnnconvert command
- Source route: cmake .. -DMNN_BUILD_CONVERTER=ON && make -j8 produces MNNConvert binary
- Ensure the tool version matches the target MNN runtime version
Step 3: Execute model conversion
Run MNNConvert specifying the source framework (-f), the input model file (--modelFile), and the output MNN model path (--MNNModel). Each framework has specific flags: Caffe also requires --prototxt for the model definition. Optionally apply graph optimization level (--optimizeLevel), weight quantization (--weightQuantBits), FP16 storage (--fp16), or external weight separation (--saveExternalData).
What happens:
- The converter parses the source model's graph and operators
- Operators are mapped to MNN's internal representation using FlatBuffers schema
- Graph optimization passes simplify the computation graph (fusing BatchNorm, removing Identity nodes, etc.)
- If weight quantization is enabled, float32 weights are compressed to the specified bit width
Step 4: Verify conversion correctness
Run the provided verification scripts (testMNNFromOnnx.py, testMNNFromTf.py, testMNNFromTflite.py, or testMNNFromTorch.py) to compare MNN inference results against the original framework's outputs. The scripts generate random inputs, run inference in both the original framework and MNN, and compare outputs within a configurable threshold (default 0.01).
Key considerations:
- Requires the original framework's runtime installed (e.g., onnxruntime for ONNX)
- If TEST_SUCCESS is reported, the conversion is correct
- For errors, use the DEBUG mode to perform binary search and identify the problematic layer
- Identity op removal during optimization may require specifying an alternative output layer name
Step 5: Inspect and validate the MNN model
Use MNNConvert with --info flag to print model metadata (input/output names, shapes, data types, version). Optionally convert to JSON (--JsonFile) for human-readable inspection of the full model structure including operator parameters and weight statistics.
Key considerations:
- Verify input/output names and shapes match expectations
- JSON export allows manual editing and re-conversion back to MNN format
- Check dimensionFormat (NCHW vs NHWC) matches your intended input pipeline