Implementation:Alibaba MNN Onnx Export Script
Appearance
| Field | Value |
|---|---|
| implementation_name | Onnx_Export_Script |
| schema_version | 0.3.0 |
| impl_type | API Doc |
| domain | Stable Diffusion Deployment |
| stage | Model Export |
| source_file | transformers/diffusion/export/onnx_export.py (L179-200) |
| external_deps | torch, onnx, diffusers, packaging |
| last_updated | 2026-02-10 14:00 GMT |
Summary
This implementation exports all four Stable Diffusion pipeline components from a HuggingFace diffusers checkpoint to ONNX format. The script loads the StableDiffusionPipeline from diffusers, then iterates through each component (text_encoder, unet, vae_encoder, vae_decoder), traces it with representative inputs, and writes the corresponding ONNX graph to disk.
API
python onnx_export.py --model_path <hf_path> --output_path <onnx_dir> [--opset 14] [--fp16]
Key Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| --model_path | str | Yes | -- | Path to the diffusers checkpoint (local directory or HuggingFace Hub identifier) |
| --output_path | str | Yes | -- | Directory where ONNX models will be written |
| --opset | int | No | 14 | ONNX operator set version to use |
| --fp16 | flag | No | False | Export models in float16 precision (requires CUDA GPU) |
Inputs
- A HuggingFace Stable Diffusion model directory (the output of the model acquisition step), containing PyTorch weights for all pipeline components.
Outputs
An ONNX directory with four sub-model directories:
<output_path>/
text_encoder/
model.onnx # CLIP text encoder
unet/
model.onnx # UNet denoising network
weights.pb # External weight data (UNet > 2GB)
vae_encoder/
model.onnx # VAE encoder
vae_decoder/
model.onnx # VAE decoder
Core Function Signature
The main conversion logic is in the convert_models function:
@torch.no_grad()
def convert_models(model_path: str, output_path: str, opset: int, fp16: bool = False):
The low-level export helper used for each component:
def onnx_export(
model,
model_args: tuple,
output_path: Path,
ordered_input_names,
output_names,
dynamic_axes,
opset,
use_external_data_format=False,
):
Component Export Details
Text Encoder:
- Input names:
["input_ids"] - Output names:
["last_hidden_state", "pooler_output"] - Dynamic axes:
None(static shape) - Input is cast to
torch.int32for CLIP compatibility
UNet:
- Input names:
["sample", "timestep", "encoder_hidden_states"] - Output names:
["out_sample"] - Dynamic axes:
None(static shape) - Uses
use_external_data_format=Truebecause UNet exceeds 2 GB - External weights are collated into a single
weights.pbviaonnx.save_model
VAE Encoder:
- Input names:
["sample", "return_dict"] - Output names:
["latent_sample"] - Forward is monkey-patched to call
vae_encoder.encode(sample, return_dict)[0].mode()
VAE Decoder:
- Input names:
["latent_sample"] - Output names:
["sample"] - Forward is monkey-patched to call
vae_decoder.decode(latent, return_dict=False)[0]
Usage Example
# Export stable-diffusion-v1-5 to ONNX with default opset 14
python onnx_export.py \
--model_path ./stable-diffusion-v1-5 \
--output_path ./onnx_sd15
# Export with float16 precision (requires CUDA)
python onnx_export.py \
--model_path ./stable-diffusion-v1-5 \
--output_path ./onnx_sd15_fp16 \
--fp16
Notes
- The script deletes each component from memory after export (
del pipeline.text_encoder, etc.) to manage peak memory usage. - Float16 export will raise a
ValueErrorif CUDA is not available. - The UNet external data collation step (
shutil.rmtreethenonnx.save_model) cleans up fragmented tensor files into a singleweights.pb.
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment