Workflow:Alibaba MNN Stable Diffusion Deployment
| Knowledge Sources | |
|---|---|
| Domains | Generative_AI, Model_Deployment, On_Device_AI |
| Last Updated | 2026-02-10 08:00 GMT |
Overview
End-to-end process for deploying Stable Diffusion models to on-device inference using MNN-Diffusion, covering ONNX export from Hugging Face, MNN conversion with optional quantization, engine compilation, and text-to-image generation.
Description
This workflow covers converting Hugging Face Stable Diffusion models (such as stable-diffusion-v1-5, ChilloutMix, or Taiyi Chinese Stable Diffusion) into MNN format and running text-to-image generation on mobile and desktop devices. The pipeline involves exporting the diffusion pipeline components (text encoder, UNet, VAE decoder) to ONNX, converting them to MNN with optional weight quantization and transformer fusion for GPU acceleration, compiling the MNN engine with diffusion support, and running the diffusion_demo tool with configurable iteration count, memory mode, and hardware backend.
Key outputs:
- MNN-format diffusion model files (text encoder, UNet, VAE decoder)
- Compiled diffusion_demo executable
- Generated images from text prompts
Usage
Execute this workflow when you need to run text-to-image generation locally on mobile devices or PCs using Stable Diffusion models, without requiring cloud-based GPU inference. Supports both English (stable-diffusion-v1-5, ChilloutMix) and Chinese (Taiyi) prompt inputs.
Execution Steps
Step 1: Download the Stable Diffusion model
Clone or download a supported Stable Diffusion model from Hugging Face or ModelScope. Supported models include stable-diffusion-v1-5, ChilloutMix, and IDEA-CCNL Taiyi-Stable-Diffusion-1B-Chinese. Ensure all model components (text encoder, UNet, VAE, tokenizer files) are present.
Key considerations:
- Use git-lfs for complete model weight downloads
- Verify the tokenizer directory contains merges.txt and vocab.json (for English models) or vocab.txt (for Taiyi Chinese model)
- These tokenizer files must be copied to the final resource directory
Step 2: Export model to ONNX
Run the onnx_export.py script from transformers/diffusion/export to convert the Hugging Face model into ONNX format. This script requires torch, onnx, and diffusers libraries installed, which can be set up via the provided conda environment file (env.yaml).
What happens:
- Each pipeline component (text encoder, UNet, VAE) is exported as a separate ONNX model
- ONNX opset version 18 is used for compatibility
- The export script traces each model component with appropriate dummy inputs
Key considerations:
- Install conda environment first: conda env create -f env.yaml && conda activate ldm
- The --opset 18 flag ensures modern operator support
Step 3: Convert ONNX models to MNN
Run the convert_mnn.py script to convert the exported ONNX models to MNN format. Apply weight quantization (--weightQuantBits=8) to reduce model size. For GPU inference acceleration on OpenCL or Metal backends, add the --transformerFuse flag to enable transformer-specific operator fusion.
Key considerations:
- Weight quantization to 8-bit reduces model size by approximately 75%
- The --transformerFuse flag is essential for GPU performance but only works with OpenCL and Metal backends
- For CPU-only or non-OpenCL/Metal inference, omit --transformerFuse to avoid unsupported operator errors
Step 4: Compile MNN with diffusion support
Build the MNN engine from source with diffusion-specific CMake flags: -DMNN_BUILD_DIFFUSION=ON for the diffusion runtime, -DMNN_BUILD_OPENCV=ON and -DMNN_IMGCODECS=ON for image I/O, and -DMNN_SUPPORT_TRANSFORMER_FUSE=ON for transformer operator fusion. Add GPU backend flags as needed (-DMNN_OPENCL=ON for Android/PC GPU).
Key considerations:
- All diffusion builds require -DMNN_LOW_MEMORY=ON and -DMNN_SEP_BUILD=OFF
- Android builds use the project/android/build_64.sh script with the same flags
- The compiled output includes diffusion_demo executable
Step 5: Prepare resources and run inference
Copy the tokenizer files (merges.txt/vocab.json or vocab.txt) from the original Hugging Face model into the MNN model directory. Run diffusion_demo specifying the resource path, model type (0 for English SD models, 1 for Taiyi Chinese), memory mode (0=memory-saving, 1=performance, 2=balanced), backend type, iteration count (10-20 recommended), random seed, output filename, and prompt text.
Key considerations:
- Model type 0 for stable-diffusion-v1-5 and ChilloutMix; type 1 for Taiyi
- Memory mode 0 loads/unloads models per component to save memory; mode 1 keeps all in memory for speed
- Iteration count of 10-20 provides good quality; fewer iterations produce rougher images
- Negative random seed (-1) generates a new seed each run; positive values enable reproducibility
- OpenCL FP16 devices need at least 2GB GPU memory; non-FP16 devices need 4GB+