Workflow:Tencent Ncnn PyTorch Model Conversion and Inference
| Knowledge Sources | |
|---|---|
| Domains | Model_Conversion, Inference, Edge_Deployment |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
End-to-end process for converting a PyTorch model to ncnn format using PNNX and running inference on edge devices.
Description
This workflow covers the complete pipeline from a trained PyTorch model to deployed ncnn inference. It uses PNNX (PyTorch Neural Network eXchange), the recommended conversion tool that directly converts PyTorch models to ncnn format, bypassing the unstable ONNX intermediate representation. The process includes exporting the PyTorch model to TorchScript, converting it via PNNX to produce ncnn .param and .bin files, loading the model into the ncnn runtime, preprocessing input data, executing inference, and extracting output results.
Key outcomes:
- A pair of ncnn model files (.param graph definition + .bin weights) suitable for mobile and embedded deployment
- Working inference code using the ncnn C++ API
Usage
Execute this workflow when you have a trained PyTorch model (.pt or nn.Module) and need to deploy it for inference on mobile devices, embedded systems, or any platform where ncnn runs. This is the primary and recommended path for all PyTorch-to-ncnn deployments.
Execution Steps
Step 1: Export PyTorch Model to TorchScript
Serialize the trained PyTorch model into TorchScript format using torch.jit.trace (or torch.jit.script for models with control flow). The model must be set to evaluation mode before tracing. A dummy input tensor with the correct shape is required for tracing.
Key considerations:
- Set model to eval() mode before export to disable dropout and batch normalization training behavior
- Use torch.jit.trace for models with static control flow; use torch.jit.script for dynamic control flow
- The dummy input shape determines the default input dimensions
Step 2: Convert TorchScript to ncnn via PNNX
Run the PNNX converter on the exported TorchScript file. PNNX parses the TorchScript graph, applies operator fusion and optimization passes, and produces ncnn-compatible .param (graph definition) and .bin (model weights) files. PNNX preserves high-level PyTorch semantics and generates cleaner graphs than the traditional ONNX-based pipeline.
Key considerations:
- Install PNNX via pip install pnnx or use the standalone binary
- Specify inputshape to define the expected input dimensions
- Use inputshape2 for dynamic shape support (second calibration shape)
- Alternatively, use pnnx.export() directly in Python for a single-step conversion
- PNNX also generates a Python inference script and ONNX file as bonus outputs
Step 3: Optimize the ncnn Model (Optional)
Run ncnnoptimize on the generated ncnn model to apply graph-level optimizations. This fuses operators (e.g., Convolution+BatchNorm, Convolution+ReLU), eliminates no-op layers, and converts weights to fp16 for reduced model size.
Key considerations:
- If the model was converted via PNNX, most optimizations are already applied; this step is optional
- Pass 65536 as the last argument to convert weights to fp16 storage
- Pass 0 to keep fp32 weights
- The tool produces a new optimized .param/.bin pair
Step 4: Load Model into ncnn Runtime
Instantiate an ncnn::Net object and load the optimized .param and .bin files. Configure inference options such as thread count, Vulkan GPU compute, and precision settings before loading the model.
Key considerations:
- Set net.opt options before calling load_param and load_model
- For production deployment, use ncnn2mem to embed models as C arrays and load from memory
- Binary param format (load_param_bin) hides network architecture strings for distribution
Step 5: Preprocess Input Data
Convert raw input data (typically images) into ncnn::Mat format. Apply the same normalization (mean subtraction and scaling) that was used during model training. Use ncnn's built-in pixel conversion and resize functions for efficient preprocessing.
Key considerations:
- Use ncnn::Mat::from_pixels or from_pixels_resize for image inputs
- Specify the correct pixel format (PIXEL_BGR, PIXEL_RGB, PIXEL_GRAY, etc.)
- Apply substract_mean_normalize with the same mean and norm values used during training
- These pixel operations are SIMD-optimized within ncnn
Step 6: Execute Inference and Extract Results
Create an ncnn::Extractor from the loaded Net, feed input data by blob name, and extract output tensors. The Extractor manages the computation graph execution and memory allocation for the forward pass.
Key considerations:
- Use ex.input("blob_name", mat) to set input data
- Use ex.extract("output_blob_name", out) to retrieve output tensors
- Blob names correspond to the layer names in the .param file
- Multiple outputs can be extracted from a single forward pass
- The Extractor is lightweight and can be created per-inference