Workflow:Alibaba MNN Stable Diffusion Deployment

Knowledge Sources	Alibaba MNN MNN Diffusion Guide MNN Docs
Domains	Generative_AI, Model_Deployment, On_Device_AI
Last Updated	2026-02-10 08:00 GMT

Overview

End-to-end process for deploying Stable Diffusion models to on-device inference using MNN-Diffusion, covering ONNX export from Hugging Face, MNN conversion with optional quantization, engine compilation, and text-to-image generation.

Description

This workflow covers converting Hugging Face Stable Diffusion models (such as stable-diffusion-v1-5, ChilloutMix, or Taiyi Chinese Stable Diffusion) into MNN format and running text-to-image generation on mobile and desktop devices. The pipeline involves exporting the diffusion pipeline components (text encoder, UNet, VAE decoder) to ONNX, converting them to MNN with optional weight quantization and transformer fusion for GPU acceleration, compiling the MNN engine with diffusion support, and running the diffusion_demo tool with configurable iteration count, memory mode, and hardware backend.

Key outputs:

MNN-format diffusion model files (text encoder, UNet, VAE decoder)
Compiled diffusion_demo executable
Generated images from text prompts

Usage

Execute this workflow when you need to run text-to-image generation locally on mobile devices or PCs using Stable Diffusion models, without requiring cloud-based GPU inference. Supports both English (stable-diffusion-v1-5, ChilloutMix) and Chinese (Taiyi) prompt inputs.

Execution Steps

Step 1: Download the Stable Diffusion model

Clone or download a supported Stable Diffusion model from Hugging Face or ModelScope. Supported models include stable-diffusion-v1-5, ChilloutMix, and IDEA-CCNL Taiyi-Stable-Diffusion-1B-Chinese. Ensure all model components (text encoder, UNet, VAE, tokenizer files) are present.

Key considerations:

Use git-lfs for complete model weight downloads
Verify the tokenizer directory contains merges.txt and vocab.json (for English models) or vocab.txt (for Taiyi Chinese model)
These tokenizer files must be copied to the final resource directory

Step 2: Export model to ONNX

Run the onnx_export.py script from transformers/diffusion/export to convert the Hugging Face model into ONNX format. This script requires torch, onnx, and diffusers libraries installed, which can be set up via the provided conda environment file (env.yaml).

What happens:

Each pipeline component (text encoder, UNet, VAE) is exported as a separate ONNX model
ONNX opset version 18 is used for compatibility
The export script traces each model component with appropriate dummy inputs

Key considerations:

Install conda environment first: conda env create -f env.yaml && conda activate ldm
The --opset 18 flag ensures modern operator support

Step 3: Convert ONNX models to MNN

Run the convert_mnn.py script to convert the exported ONNX models to MNN format. Apply weight quantization (--weightQuantBits=8) to reduce model size. For GPU inference acceleration on OpenCL or Metal backends, add the --transformerFuse flag to enable transformer-specific operator fusion.

Key considerations:

Weight quantization to 8-bit reduces model size by approximately 75%
The --transformerFuse flag is essential for GPU performance but only works with OpenCL and Metal backends
For CPU-only or non-OpenCL/Metal inference, omit --transformerFuse to avoid unsupported operator errors

Step 4: Compile MNN with diffusion support

Build the MNN engine from source with diffusion-specific CMake flags: -DMNN_BUILD_DIFFUSION=ON for the diffusion runtime, -DMNN_BUILD_OPENCV=ON and -DMNN_IMGCODECS=ON for image I/O, and -DMNN_SUPPORT_TRANSFORMER_FUSE=ON for transformer operator fusion. Add GPU backend flags as needed (-DMNN_OPENCL=ON for Android/PC GPU).

Key considerations:

All diffusion builds require -DMNN_LOW_MEMORY=ON and -DMNN_SEP_BUILD=OFF
Android builds use the project/android/build_64.sh script with the same flags
The compiled output includes diffusion_demo executable

Step 5: Prepare resources and run inference

Copy the tokenizer files (merges.txt/vocab.json or vocab.txt) from the original Hugging Face model into the MNN model directory. Run diffusion_demo specifying the resource path, model type (0 for English SD models, 1 for Taiyi Chinese), memory mode (0=memory-saving, 1=performance, 2=balanced), backend type, iteration count (10-20 recommended), random seed, output filename, and prompt text.

Key considerations:

Model type 0 for stable-diffusion-v1-5 and ChilloutMix; type 1 for Taiyi
Memory mode 0 loads/unloads models per component to save memory; mode 1 keeps all in memory for speed
Iteration count of 10-20 provides good quality; fewer iterations produce rougher images
Negative random seed (-1) generates a new seed each run; positive values enable reproducibility
OpenCL FP16 devices need at least 2GB GPU memory; non-FP16 devices need 4GB+

Execution Diagram

GitHub URL

Workflow Repository