Implementation:Alibaba MNN Onnx Export Script

Field	Value
implementation_name	Onnx_Export_Script
schema_version	0.3.0
impl_type	API Doc
domain	Stable Diffusion Deployment
stage	Model Export
source_file	transformers/diffusion/export/onnx_export.py (L179-200)
external_deps	torch, onnx, diffusers, packaging
last_updated	2026-02-10 14:00 GMT

Summary

This implementation exports all four Stable Diffusion pipeline components from a HuggingFace diffusers checkpoint to ONNX format. The script loads the StableDiffusionPipeline from diffusers, then iterates through each component (text_encoder, unet, vae_encoder, vae_decoder), traces it with representative inputs, and writes the corresponding ONNX graph to disk.

API

python onnx_export.py --model_path <hf_path> --output_path <onnx_dir> [--opset 14] [--fp16]

Key Parameters

Parameter	Type	Required	Default	Description
--model_path	str	Yes	--	Path to the diffusers checkpoint (local directory or HuggingFace Hub identifier)
--output_path	str	Yes	--	Directory where ONNX models will be written
--opset	int	No	14	ONNX operator set version to use
--fp16	flag	No	False	Export models in float16 precision (requires CUDA GPU)

Inputs

A HuggingFace Stable Diffusion model directory (the output of the model acquisition step), containing PyTorch weights for all pipeline components.

Outputs

An ONNX directory with four sub-model directories:

<output_path>/
  text_encoder/
    model.onnx              # CLIP text encoder
  unet/
    model.onnx              # UNet denoising network
    weights.pb              # External weight data (UNet > 2GB)
  vae_encoder/
    model.onnx              # VAE encoder
  vae_decoder/
    model.onnx              # VAE decoder

Core Function Signature

The main conversion logic is in the convert_models function:

@torch.no_grad()
def convert_models(model_path: str, output_path: str, opset: int, fp16: bool = False):

The low-level export helper used for each component:

def onnx_export(
    model,
    model_args: tuple,
    output_path: Path,
    ordered_input_names,
    output_names,
    dynamic_axes,
    opset,
    use_external_data_format=False,
):

Component Export Details

Text Encoder:

Input names: ["input_ids"]
Output names: ["last_hidden_state", "pooler_output"]
Dynamic axes: None (static shape)
Input is cast to torch.int32 for CLIP compatibility

UNet:

Input names: ["sample", "timestep", "encoder_hidden_states"]
Output names: ["out_sample"]
Dynamic axes: None (static shape)
Uses use_external_data_format=True because UNet exceeds 2 GB
External weights are collated into a single weights.pb via onnx.save_model

VAE Encoder:

Input names: ["sample", "return_dict"]
Output names: ["latent_sample"]
Forward is monkey-patched to call vae_encoder.encode(sample, return_dict)[0].mode()

VAE Decoder:

Input names: ["latent_sample"]
Output names: ["sample"]
Forward is monkey-patched to call vae_decoder.decode(latent, return_dict=False)[0]

Usage Example

# Export stable-diffusion-v1-5 to ONNX with default opset 14
python onnx_export.py \
    --model_path ./stable-diffusion-v1-5 \
    --output_path ./onnx_sd15

# Export with float16 precision (requires CUDA)
python onnx_export.py \
    --model_path ./stable-diffusion-v1-5 \
    --output_path ./onnx_sd15_fp16 \
    --fp16

Notes

The script deletes each component from memory after export (del pipeline.text_encoder, etc.) to manage peak memory usage.
Float16 export will raise a ValueError if CUDA is not available.
The UNet external data collation step (shutil.rmtree then onnx.save_model) cleans up fragmented tensor files into a single weights.pb.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment