Implementation:Alibaba MNN Diffusion Demo CLI

Field	Value
implementation_name	Diffusion_Demo_CLI
schema_version	0.3.0
impl_type	API Doc
domain	Stable Diffusion Deployment
stage	Inference Execution
source_file	transformers/diffusion/engine/diffusion_demo.cpp (L8-74)
external_deps	libMNN (compiled with diffusion support)
last_updated	2026-02-10 14:00 GMT

Summary

This implementation documents the diffusion_demo CLI binary, which executes the full Stable Diffusion text-to-image pipeline using converted MNN models. The binary accepts 8 positional arguments specifying the model path, model type, memory mode, backend type, iteration count, random seed, output image path, and prompt text.

API

./diffusion_demo <resource_path> <model_type> <memory_mode> <backend_type> <iteration_num> <random_seed> <output_image> <prompt_text>

Key Parameters

Parameter	Position	Type	Description	Valid Values
resource_path	argv[1]	string	Path to directory containing MNN model files and tokenizer	Directory path
model_type	argv[2]	int	Diffusion model variant (cast to `DiffusionModelType`)	0 = STABLE_DIFFUSION_1_5, 1 = STABLE_DIFFUSION_TAIYI_CHINESE, 2 = SANA_DIFFUSION
memory_mode	argv[3]	int	Memory management strategy	0 = memory saving (2GB+), 1 = memory enough (fast), 2 = balance
backend_type	argv[4]	int	Hardware backend (cast to `MNNForwardType`)	0 = CPU, 3 = OpenCL, 4 = Metal
iteration_num	argv[5]	int	Number of denoising iterations	Typically 10-20
random_seed	argv[6]	int	Random seed for noise initialization	-1 for random, or any positive integer for reproducibility
output_image	argv[7]	string	Output image file path	e.g., `output.jpg`, `result.png`
prompt_text	argv[8+]	string(s)	Text prompt for image generation (multiple words are joined with spaces)	Any text string

Inputs

An MNN model directory (the resource_path) containing:

text_encoder.mnn -- Converted CLIP text encoder
unet.mnn -- Converted UNet denoising network
vae_encoder.mnn -- Converted VAE encoder (for img2img)
vae_decoder.mnn -- Converted VAE decoder
Tokenizer vocabulary files (e.g., vocab.json, merges.txt)

Outputs

A generated image file (JPEG or PNG) at the specified output_image path
Progress percentage printed to stdout via the progress callback

Core Code Flow

int main(int argc, const char* argv[]) {
    if (argc < 9) {
        MNN_PRINT("Usage: ./diffusion_demo <resource_path> <model_type> <memory_mode> "
                   "<backend_type> <iteration_num> <random_seed> <output_image_name> <prompt_text>\n");
        return 0;
    }

    auto resource_path = argv[1];
    auto model_type = (DiffusionModelType)atoi(argv[2]);
    auto memory_mode = atoi(argv[3]);
    auto backend_type = (MNNForwardType)atoi(argv[4]);
    auto iteration_num = atoi(argv[5]);
    auto random_seed = atoi(argv[6]);
    auto img_name = argv[7];

    // Join remaining arguments as prompt text
    std::string input_text;
    for (int i = 8; i < argc; ++i) {
        input_text += argv[i];
        if (i < argc - 1) input_text += " ";
    }

    // Create diffusion pipeline via factory method
    std::unique_ptr<Diffusion> diffusion(
        Diffusion::createDiffusion(resource_path, model_type, backend_type, memory_mode));

    // Load model components
    diffusion->load();

    // Run inference with progress callback
    auto progressDisplay = [](int progress) {
        std::cout << "Progress: " << progress << "%" << std::endl;
    };
    diffusion->run(input_text, img_name, iteration_num, random_seed, progressDisplay);

    return 0;
}

Factory Method and Class Hierarchy

The Diffusion::createDiffusion static factory method selects the appropriate implementation based on model_type:

// From diffusion.hpp
static Diffusion* createDiffusion(std::string modelPath, DiffusionModelType modelType,
                                   MNNForwardType backendType, int memoryMode);

// From diffusion.cpp -- returns StableDiffusion or SanaDiffusion
Diffusion* Diffusion::createDiffusion(std::string modelPath, DiffusionModelType modelType,
                                       MNNForwardType backendType, int memoryMode) {
    if (modelType == SANA_DIFFUSION) {
        return new SanaDiffusion(modelPath, modelType, backendType, memoryMode);
    } else {
        return new StableDiffusion(modelPath, modelType, backendType, memoryMode);
    }
}

The DiffusionModelType enum is defined as:

typedef enum {
    STABLE_DIFFUSION_1_5 = 0,
    STABLE_DIFFUSION_TAIYI_CHINESE = 1,
    SANA_DIFFUSION = 2,
    DIFFUSION_MODEL_USER
} DiffusionModelType;

Usage Examples

Generate an image with SD v1.5 on CPU:

./diffusion_demo ./mnn_sd15 0 0 0 20 42 output.jpg "a beautiful sunset over the ocean"

Generate with Taiyi Chinese model on OpenCL GPU:

./diffusion_demo ./mnn_taiyi 1 1 3 15 -1 result.png "一只可爱的猫咪在花园里玩耍"

Generate with SD v1.5 on Metal GPU (macOS) in balanced memory mode:

./diffusion_demo ./mnn_sd15 0 2 4 20 123 photo.jpg "a photorealistic portrait of a woman"

Multi-Generation Pattern

The source code includes a commented-out loop demonstrating how to generate multiple images in sequence:

// For multiple generations:
// - Memory saving mode (0): call diffusion->load() before each run
// - Memory enough mode (1): only load once, then call run() repeatedly
while(0) {
    if(memory_mode != 1) {
        diffusion->load();
    }
    diffusion->run("a big horse", "demo_2.jpg", 20, 42, progressDisplay);
}

Notes

The prompt text is constructed by joining all arguments from position 8 onward with spaces, so the prompt does not need to be quoted on the command line (though quoting is recommended for special characters).
The progressDisplay lambda callback prints percentage progress to stdout during the denoising loop.
The binary requires at least 9 arguments (argc < 9 check); if fewer are provided, it prints a usage message and exits.
Memory mode 0 is recommended for devices with limited RAM (2 GB+); mode 1 is recommended for desktop/server environments with ample memory.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment