Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba MNN Diffusion Demo CLI

From Leeroopedia


Field Value
implementation_name Diffusion_Demo_CLI
schema_version 0.3.0
impl_type API Doc
domain Stable Diffusion Deployment
stage Inference Execution
source_file transformers/diffusion/engine/diffusion_demo.cpp (L8-74)
external_deps libMNN (compiled with diffusion support)
last_updated 2026-02-10 14:00 GMT

Summary

This implementation documents the diffusion_demo CLI binary, which executes the full Stable Diffusion text-to-image pipeline using converted MNN models. The binary accepts 8 positional arguments specifying the model path, model type, memory mode, backend type, iteration count, random seed, output image path, and prompt text.

API

./diffusion_demo <resource_path> <model_type> <memory_mode> <backend_type> <iteration_num> <random_seed> <output_image> <prompt_text>

Key Parameters

Parameter Position Type Description Valid Values
resource_path argv[1] string Path to directory containing MNN model files and tokenizer Directory path
model_type argv[2] int Diffusion model variant (cast to DiffusionModelType) 0 = STABLE_DIFFUSION_1_5, 1 = STABLE_DIFFUSION_TAIYI_CHINESE, 2 = SANA_DIFFUSION
memory_mode argv[3] int Memory management strategy 0 = memory saving (2GB+), 1 = memory enough (fast), 2 = balance
backend_type argv[4] int Hardware backend (cast to MNNForwardType) 0 = CPU, 3 = OpenCL, 4 = Metal
iteration_num argv[5] int Number of denoising iterations Typically 10-20
random_seed argv[6] int Random seed for noise initialization -1 for random, or any positive integer for reproducibility
output_image argv[7] string Output image file path e.g., output.jpg, result.png
prompt_text argv[8+] string(s) Text prompt for image generation (multiple words are joined with spaces) Any text string

Inputs

An MNN model directory (the resource_path) containing:

  • text_encoder.mnn -- Converted CLIP text encoder
  • unet.mnn -- Converted UNet denoising network
  • vae_encoder.mnn -- Converted VAE encoder (for img2img)
  • vae_decoder.mnn -- Converted VAE decoder
  • Tokenizer vocabulary files (e.g., vocab.json, merges.txt)

Outputs

  • A generated image file (JPEG or PNG) at the specified output_image path
  • Progress percentage printed to stdout via the progress callback

Core Code Flow

int main(int argc, const char* argv[]) {
    if (argc < 9) {
        MNN_PRINT("Usage: ./diffusion_demo <resource_path> <model_type> <memory_mode> "
                   "<backend_type> <iteration_num> <random_seed> <output_image_name> <prompt_text>\n");
        return 0;
    }

    auto resource_path = argv[1];
    auto model_type = (DiffusionModelType)atoi(argv[2]);
    auto memory_mode = atoi(argv[3]);
    auto backend_type = (MNNForwardType)atoi(argv[4]);
    auto iteration_num = atoi(argv[5]);
    auto random_seed = atoi(argv[6]);
    auto img_name = argv[7];

    // Join remaining arguments as prompt text
    std::string input_text;
    for (int i = 8; i < argc; ++i) {
        input_text += argv[i];
        if (i < argc - 1) input_text += " ";
    }

    // Create diffusion pipeline via factory method
    std::unique_ptr<Diffusion> diffusion(
        Diffusion::createDiffusion(resource_path, model_type, backend_type, memory_mode));

    // Load model components
    diffusion->load();

    // Run inference with progress callback
    auto progressDisplay = [](int progress) {
        std::cout << "Progress: " << progress << "%" << std::endl;
    };
    diffusion->run(input_text, img_name, iteration_num, random_seed, progressDisplay);

    return 0;
}

Factory Method and Class Hierarchy

The Diffusion::createDiffusion static factory method selects the appropriate implementation based on model_type:

// From diffusion.hpp
static Diffusion* createDiffusion(std::string modelPath, DiffusionModelType modelType,
                                   MNNForwardType backendType, int memoryMode);

// From diffusion.cpp -- returns StableDiffusion or SanaDiffusion
Diffusion* Diffusion::createDiffusion(std::string modelPath, DiffusionModelType modelType,
                                       MNNForwardType backendType, int memoryMode) {
    if (modelType == SANA_DIFFUSION) {
        return new SanaDiffusion(modelPath, modelType, backendType, memoryMode);
    } else {
        return new StableDiffusion(modelPath, modelType, backendType, memoryMode);
    }
}

The DiffusionModelType enum is defined as:

typedef enum {
    STABLE_DIFFUSION_1_5 = 0,
    STABLE_DIFFUSION_TAIYI_CHINESE = 1,
    SANA_DIFFUSION = 2,
    DIFFUSION_MODEL_USER
} DiffusionModelType;

Usage Examples

Generate an image with SD v1.5 on CPU:

./diffusion_demo ./mnn_sd15 0 0 0 20 42 output.jpg "a beautiful sunset over the ocean"

Generate with Taiyi Chinese model on OpenCL GPU:

./diffusion_demo ./mnn_taiyi 1 1 3 15 -1 result.png "一只可爱的猫咪在花园里玩耍"

Generate with SD v1.5 on Metal GPU (macOS) in balanced memory mode:

./diffusion_demo ./mnn_sd15 0 2 4 20 123 photo.jpg "a photorealistic portrait of a woman"

Multi-Generation Pattern

The source code includes a commented-out loop demonstrating how to generate multiple images in sequence:

// For multiple generations:
// - Memory saving mode (0): call diffusion->load() before each run
// - Memory enough mode (1): only load once, then call run() repeatedly
while(0) {
    if(memory_mode != 1) {
        diffusion->load();
    }
    diffusion->run("a big horse", "demo_2.jpg", 20, 42, progressDisplay);
}

Notes

  • The prompt text is constructed by joining all arguments from position 8 onward with spaces, so the prompt does not need to be quoted on the command line (though quoting is recommended for special characters).
  • The progressDisplay lambda callback prints percentage progress to stdout during the denoising loop.
  • The binary requires at least 9 arguments (argc < 9 check); if fewer are provided, it prints a usage message and exits.
  • Memory mode 0 is recommended for devices with limited RAM (2 GB+); mode 1 is recommended for desktop/server environments with ample memory.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment