Principle:Alibaba MNN Diffusion MNN Conversion
| Field | Value |
|---|---|
| principle_name | Diffusion_MNN_Conversion |
| schema_version | 0.3.0 |
| principle_type | Workflow Step |
| domain | Stable Diffusion Deployment |
| stage | Model Conversion |
| scope | Batch conversion of Stable Diffusion ONNX components to MNN format with optimization flags |
| last_updated | 2026-02-10 14:00 GMT |
Overview
Diffusion MNN Conversion is the third step in the Stable Diffusion deployment workflow. After the pipeline components have been exported to ONNX, each component must be converted from ONNX to the MNN (Mobile Neural Network) format. MNN is Alibaba's proprietary inference format optimized for mobile and edge devices, supporting quantization, operator fusion, and hardware-specific optimizations.
Theory
The conversion from ONNX to MNN is performed by the MNNConvert tool (either as a compiled binary or via the pymnn Python package). The conversion involves several key transformations:
- Graph optimization: MNNConvert applies graph-level optimizations such as constant folding, dead code elimination, and operator merging to produce a more efficient execution graph.
- External data storage: The
--saveExternalData=1flag is used for all components. This separates the model graph structure from the weight data, enabling memory-mapped loading for faster model initialization and reduced peak memory. - Weight quantization (optional): The
--weightQuantBits=8flag quantizes model weights from float32 to 8-bit integers, reducing model size by approximately 4x with minimal quality degradation. This is particularly important for mobile deployment where storage and memory are limited. - Transformer fusion (optional): The
--transformerFuseflag enables fusion of multi-head attention patterns (Q/K/V projection, scaled dot-product attention, output projection) into a single optimized kernel. This is critical for UNet performance since the UNet contains numerous self-attention and cross-attention layers.
Converted Components
The conversion script processes three of the four pipeline components:
- text_encoder -- Converts
text_encoder/model.onnxtotext_encoder.mnn - unet -- Converts
unet/model.onnx(with externalweights.pb) tounet.mnn - vae_decoder -- Converts
vae_decoder/model.onnxtovae_decoder.mnn
The VAE encoder is not converted by default in the script since it is only needed for image-to-image workflows and can be converted separately if required.
Optimization Flags
Common extra flags passed during conversion:
--weightQuantBits=8-- 8-bit weight quantization for reduced model size--transformerFuse-- Fuse transformer attention patterns for accelerated inference- These flags are passed as a single quoted string argument to the conversion script
MNNConvert Tool Resolution
The conversion script attempts to locate the MNNConvert binary at the relative path ../../../build/MNNConvert. If this compiled binary is not found, it falls back to using mnnconvert from the pymnn Python package, which must be installed via pip.