Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Alibaba MNN Onnx Export Script

From Leeroopedia


Field Value
implementation_name Onnx_Export_Script
schema_version 0.3.0
impl_type API Doc
domain Stable Diffusion Deployment
stage Model Export
source_file transformers/diffusion/export/onnx_export.py (L179-200)
external_deps torch, onnx, diffusers, packaging
last_updated 2026-02-10 14:00 GMT

Summary

This implementation exports all four Stable Diffusion pipeline components from a HuggingFace diffusers checkpoint to ONNX format. The script loads the StableDiffusionPipeline from diffusers, then iterates through each component (text_encoder, unet, vae_encoder, vae_decoder), traces it with representative inputs, and writes the corresponding ONNX graph to disk.

API

python onnx_export.py --model_path <hf_path> --output_path <onnx_dir> [--opset 14] [--fp16]

Key Parameters

Parameter Type Required Default Description
--model_path str Yes -- Path to the diffusers checkpoint (local directory or HuggingFace Hub identifier)
--output_path str Yes -- Directory where ONNX models will be written
--opset int No 14 ONNX operator set version to use
--fp16 flag No False Export models in float16 precision (requires CUDA GPU)

Inputs

  • A HuggingFace Stable Diffusion model directory (the output of the model acquisition step), containing PyTorch weights for all pipeline components.

Outputs

An ONNX directory with four sub-model directories:

<output_path>/
  text_encoder/
    model.onnx              # CLIP text encoder
  unet/
    model.onnx              # UNet denoising network
    weights.pb              # External weight data (UNet > 2GB)
  vae_encoder/
    model.onnx              # VAE encoder
  vae_decoder/
    model.onnx              # VAE decoder

Core Function Signature

The main conversion logic is in the convert_models function:

@torch.no_grad()
def convert_models(model_path: str, output_path: str, opset: int, fp16: bool = False):

The low-level export helper used for each component:

def onnx_export(
    model,
    model_args: tuple,
    output_path: Path,
    ordered_input_names,
    output_names,
    dynamic_axes,
    opset,
    use_external_data_format=False,
):

Component Export Details

Text Encoder:

  • Input names: ["input_ids"]
  • Output names: ["last_hidden_state", "pooler_output"]
  • Dynamic axes: None (static shape)
  • Input is cast to torch.int32 for CLIP compatibility

UNet:

  • Input names: ["sample", "timestep", "encoder_hidden_states"]
  • Output names: ["out_sample"]
  • Dynamic axes: None (static shape)
  • Uses use_external_data_format=True because UNet exceeds 2 GB
  • External weights are collated into a single weights.pb via onnx.save_model

VAE Encoder:

  • Input names: ["sample", "return_dict"]
  • Output names: ["latent_sample"]
  • Forward is monkey-patched to call vae_encoder.encode(sample, return_dict)[0].mode()

VAE Decoder:

  • Input names: ["latent_sample"]
  • Output names: ["sample"]
  • Forward is monkey-patched to call vae_decoder.decode(latent, return_dict=False)[0]

Usage Example

# Export stable-diffusion-v1-5 to ONNX with default opset 14
python onnx_export.py \
    --model_path ./stable-diffusion-v1-5 \
    --output_path ./onnx_sd15

# Export with float16 precision (requires CUDA)
python onnx_export.py \
    --model_path ./stable-diffusion-v1-5 \
    --output_path ./onnx_sd15_fp16 \
    --fp16

Notes

  • The script deletes each component from memory after export (del pipeline.text_encoder, etc.) to manage peak memory usage.
  • Float16 export will raise a ValueError if CUDA is not available.
  • The UNet external data collation step (shutil.rmtree then onnx.save_model) cleans up fragmented tensor files into a single weights.pb.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment