Environment:Alibaba MNN Python Export Environment

Knowledge Sources	MNN MNN llmexport.py MNN onnx_export.py
Domains	Infrastructure, Model_Export
Last Updated	2026-02-10 00:00 GMT

Overview

Python environment for exporting LLM and diffusion models from HuggingFace format to MNN format. Requires Python 3.5+, PyTorch, ONNX, and transformers.

Description

This environment provides the Python runtime and package dependencies needed to convert models from HuggingFace/PyTorch formats into MNN-compatible formats. The export pipeline spans two primary workflows: (1) LLM export via llmexport.py, which converts transformer-based language models into MNN format with optional quantization (AWQ, smooth quantization), and (2) ONNX export via onnx_export.py, which converts PyTorch models to ONNX intermediate representation before final MNN conversion. FP16 export requires a CUDA-capable GPU. The environment also supports diffusion model export, which requires the diffusers library and ONNX opset 18.

Usage

Use this environment for all model export and conversion workflows including LLM export to MNN, ONNX intermediate export, diffusion model export, and quantization-aware export. This environment is the prerequisite for running any of the MNN model conversion scripts.

System Requirements

Category	Requirement	Notes
Python	>= 2.7 (recommended 3.7+)	setup.py specifies python_requires='>=2.7'; 3.5+ excludes early 3.x versions
OS	Linux, macOS, Windows	Cross-platform Python environment
GPU	Optional CUDA GPU	Required only for FP16 export; CPU-only export is supported for other precision levels
Disk	5-50GB	Depends on model size for intermediate ONNX files

Dependencies

Python Packages (Core)

torch >= 1.11 (required; version >= 1.11 needed for ONNX export API changes)
onnx (required; ONNX model representation and serialization)
transformers (required; HuggingFace model loading and tokenizer support)
packaging (required; version comparison utilities)
numpy (required; numerical operations)

Python Packages (Diffusion Export)

diffusers (required for diffusion model export)
onnx with opset 18 support (required for diffusion model ONNX export)

Python Packages (Quantization)

awq module (optional; for AWQ quantization export)
Smooth quantization modules (optional; for smooth quantization workflows)

System Packages

git-lfs (required for downloading model weights from HuggingFace)
pip (required for installing Python dependencies)

Credentials

TOKENIZERS_PARALLELISM: Set to false by llmexport.py at startup to avoid deadlocks in tokenizer parallelism.
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION: Set to python by llmexport.py to force pure-Python protobuf implementation for compatibility.
HF_TOKEN: Optional; required for downloading gated models from HuggingFace Hub.

Quick Install

# Install core export dependencies
pip install torch onnx transformers packaging numpy

# Install diffusion export dependencies
pip install diffusers

# Install git-lfs for model downloads
apt-get install git-lfs  # Debian/Ubuntu
# or
brew install git-lfs      # macOS

# Run LLM export
python transformers/llm/export/llmexport.py \
  --model_path /path/to/huggingface/model \
  --output_dir /path/to/output

# Run ONNX export
python tools/mnncompress/mnncompress/pytorch/onnx_export.py \
  --model_path /path/to/model

Code Evidence

Environment variable setup at startup from llmexport.py:8-9:

os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"

Core import requirements from llmexport.py:11-12:

import onnx
import torch

Torch version check for ONNX export API from onnx_export.py:28:

if version.parse(torch.__version__) >= version.parse("1.11"):
    # Use updated ONNX export API

FP16 export requires CUDA availability from onnx_export.py:73-76:

if fp16:
    if not torch.cuda.is_available():
        raise ValueError("FP16 export requires CUDA GPU")
    model = model.cuda().half()

Python version constraint from setup.py:466:

python_requires='>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*'

Common Errors

Error Message	Cause	Solution
ValueError: FP16 export requires CUDA GPU	Attempting FP16 precision export on a CPU-only system	Install CUDA toolkit and a CUDA-capable GPU, or export in FP32 precision instead
ImportError: No module named 'diffusers'	Missing diffusers package required for diffusion model export	Install diffusers: pip install diffusers
Torch version API mismatch	PyTorch version < 1.11 uses different ONNX export API	Upgrade PyTorch to >= 1.11: pip install torch>=1.11
ImportError: No module named 'onnx'	ONNX package not installed	Install onnx: pip install onnx
ImportError: No module named 'transformers'	Transformers package not installed	Install transformers: pip install transformers

Compatibility Notes

Python 2.7: Technically supported per setup.py constraints, but Python 3.7+ is strongly recommended for full compatibility with modern PyTorch and transformers versions.
Python 3.0-3.4: Explicitly excluded by the python_requires constraint.
PyTorch < 1.11: Older ONNX export API is used; some features may be unavailable or behave differently.
FP16 Export: Strictly requires torch.cuda.is_available() to return True. CPU-only systems cannot perform FP16 export.
ONNX Opset 18: Required for diffusion model export. Older ONNX versions may not support this opset.
TOKENIZERS_PARALLELISM: Automatically set to false to prevent potential deadlocks when running tokenizer operations in multi-threaded contexts.
Protobuf: Pure-Python protobuf implementation is forced for compatibility; the C++ implementation may cause import conflicts with certain ONNX versions.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment