Environment:Alibaba MNN Python Export Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Model_Export |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
Python environment for exporting LLM and diffusion models from HuggingFace format to MNN format. Requires Python 3.5+, PyTorch, ONNX, and transformers.
Description
This environment provides the Python runtime and package dependencies needed to convert models from HuggingFace/PyTorch formats into MNN-compatible formats. The export pipeline spans two primary workflows: (1) LLM export via llmexport.py, which converts transformer-based language models into MNN format with optional quantization (AWQ, smooth quantization), and (2) ONNX export via onnx_export.py, which converts PyTorch models to ONNX intermediate representation before final MNN conversion. FP16 export requires a CUDA-capable GPU. The environment also supports diffusion model export, which requires the diffusers library and ONNX opset 18.
Usage
Use this environment for all model export and conversion workflows including LLM export to MNN, ONNX intermediate export, diffusion model export, and quantization-aware export. This environment is the prerequisite for running any of the MNN model conversion scripts.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Python | >= 2.7 (recommended 3.7+) | setup.py specifies python_requires='>=2.7'; 3.5+ excludes early 3.x versions |
| OS | Linux, macOS, Windows | Cross-platform Python environment |
| GPU | Optional CUDA GPU | Required only for FP16 export; CPU-only export is supported for other precision levels |
| Disk | 5-50GB | Depends on model size for intermediate ONNX files |
Dependencies
Python Packages (Core)
- torch >= 1.11 (required; version >= 1.11 needed for ONNX export API changes)
- onnx (required; ONNX model representation and serialization)
- transformers (required; HuggingFace model loading and tokenizer support)
- packaging (required; version comparison utilities)
- numpy (required; numerical operations)
Python Packages (Diffusion Export)
- diffusers (required for diffusion model export)
- onnx with opset 18 support (required for diffusion model ONNX export)
Python Packages (Quantization)
- awq module (optional; for AWQ quantization export)
- Smooth quantization modules (optional; for smooth quantization workflows)
System Packages
- git-lfs (required for downloading model weights from HuggingFace)
- pip (required for installing Python dependencies)
Credentials
- TOKENIZERS_PARALLELISM: Set to false by llmexport.py at startup to avoid deadlocks in tokenizer parallelism.
- PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION: Set to python by llmexport.py to force pure-Python protobuf implementation for compatibility.
- HF_TOKEN: Optional; required for downloading gated models from HuggingFace Hub.
Quick Install
# Install core export dependencies
pip install torch onnx transformers packaging numpy
# Install diffusion export dependencies
pip install diffusers
# Install git-lfs for model downloads
apt-get install git-lfs # Debian/Ubuntu
# or
brew install git-lfs # macOS
# Run LLM export
python transformers/llm/export/llmexport.py \
--model_path /path/to/huggingface/model \
--output_dir /path/to/output
# Run ONNX export
python tools/mnncompress/mnncompress/pytorch/onnx_export.py \
--model_path /path/to/model
Code Evidence
Environment variable setup at startup from llmexport.py:8-9:
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"] = "python"
Core import requirements from llmexport.py:11-12:
import onnx
import torch
Torch version check for ONNX export API from onnx_export.py:28:
if version.parse(torch.__version__) >= version.parse("1.11"):
# Use updated ONNX export API
FP16 export requires CUDA availability from onnx_export.py:73-76:
if fp16:
if not torch.cuda.is_available():
raise ValueError("FP16 export requires CUDA GPU")
model = model.cuda().half()
Python version constraint from setup.py:466:
python_requires='>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*'
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| ValueError: FP16 export requires CUDA GPU | Attempting FP16 precision export on a CPU-only system | Install CUDA toolkit and a CUDA-capable GPU, or export in FP32 precision instead |
| ImportError: No module named 'diffusers' | Missing diffusers package required for diffusion model export | Install diffusers: pip install diffusers |
| Torch version API mismatch | PyTorch version < 1.11 uses different ONNX export API | Upgrade PyTorch to >= 1.11: pip install torch>=1.11 |
| ImportError: No module named 'onnx' | ONNX package not installed | Install onnx: pip install onnx |
| ImportError: No module named 'transformers' | Transformers package not installed | Install transformers: pip install transformers |
Compatibility Notes
- Python 2.7: Technically supported per setup.py constraints, but Python 3.7+ is strongly recommended for full compatibility with modern PyTorch and transformers versions.
- Python 3.0-3.4: Explicitly excluded by the python_requires constraint.
- PyTorch < 1.11: Older ONNX export API is used; some features may be unavailable or behave differently.
- FP16 Export: Strictly requires torch.cuda.is_available() to return True. CPU-only systems cannot perform FP16 export.
- ONNX Opset 18: Required for diffusion model export. Older ONNX versions may not support this opset.
- TOKENIZERS_PARALLELISM: Automatically set to false to prevent potential deadlocks when running tokenizer operations in multi-threaded contexts.
- Protobuf: Pure-Python protobuf implementation is forced for compatibility; the C++ implementation may cause import conflicts with certain ONNX versions.