Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Alibaba MNN Diffusion MNN Conversion

From Leeroopedia


Field Value
principle_name Diffusion_MNN_Conversion
schema_version 0.3.0
principle_type Workflow Step
domain Stable Diffusion Deployment
stage Model Conversion
scope Batch conversion of Stable Diffusion ONNX components to MNN format with optimization flags
last_updated 2026-02-10 14:00 GMT

Overview

Diffusion MNN Conversion is the third step in the Stable Diffusion deployment workflow. After the pipeline components have been exported to ONNX, each component must be converted from ONNX to the MNN (Mobile Neural Network) format. MNN is Alibaba's proprietary inference format optimized for mobile and edge devices, supporting quantization, operator fusion, and hardware-specific optimizations.

Theory

The conversion from ONNX to MNN is performed by the MNNConvert tool (either as a compiled binary or via the pymnn Python package). The conversion involves several key transformations:

  • Graph optimization: MNNConvert applies graph-level optimizations such as constant folding, dead code elimination, and operator merging to produce a more efficient execution graph.
  • External data storage: The --saveExternalData=1 flag is used for all components. This separates the model graph structure from the weight data, enabling memory-mapped loading for faster model initialization and reduced peak memory.
  • Weight quantization (optional): The --weightQuantBits=8 flag quantizes model weights from float32 to 8-bit integers, reducing model size by approximately 4x with minimal quality degradation. This is particularly important for mobile deployment where storage and memory are limited.
  • Transformer fusion (optional): The --transformerFuse flag enables fusion of multi-head attention patterns (Q/K/V projection, scaled dot-product attention, output projection) into a single optimized kernel. This is critical for UNet performance since the UNet contains numerous self-attention and cross-attention layers.

Converted Components

The conversion script processes three of the four pipeline components:

  • text_encoder -- Converts text_encoder/model.onnx to text_encoder.mnn
  • unet -- Converts unet/model.onnx (with external weights.pb) to unet.mnn
  • vae_decoder -- Converts vae_decoder/model.onnx to vae_decoder.mnn

The VAE encoder is not converted by default in the script since it is only needed for image-to-image workflows and can be converted separately if required.

Optimization Flags

Common extra flags passed during conversion:

  • --weightQuantBits=8 -- 8-bit weight quantization for reduced model size
  • --transformerFuse -- Fuse transformer attention patterns for accelerated inference
  • These flags are passed as a single quoted string argument to the conversion script

MNNConvert Tool Resolution

The conversion script attempts to locate the MNNConvert binary at the relative path ../../../build/MNNConvert. If this compiled binary is not found, it falls back to using mnnconvert from the pymnn Python package, which must be installed via pip.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment