Workflow:Tencent Ncnn PyTorch Model Conversion and Inference

Knowledge Sources	ncnn PNNX PNNX Documentation PyTorch/ONNX Conversion Guide
Domains	Model_Conversion, Inference, Edge_Deployment
Last Updated	2026-02-09 19:00 GMT

Overview

End-to-end process for converting a PyTorch model to ncnn format using PNNX and running inference on edge devices.

Description

This workflow covers the complete pipeline from a trained PyTorch model to deployed ncnn inference. It uses PNNX (PyTorch Neural Network eXchange), the recommended conversion tool that directly converts PyTorch models to ncnn format, bypassing the unstable ONNX intermediate representation. The process includes exporting the PyTorch model to TorchScript, converting it via PNNX to produce ncnn .param and .bin files, loading the model into the ncnn runtime, preprocessing input data, executing inference, and extracting output results.

Key outcomes:

A pair of ncnn model files (.param graph definition + .bin weights) suitable for mobile and embedded deployment
Working inference code using the ncnn C++ API

Usage

Execute this workflow when you have a trained PyTorch model (.pt or nn.Module) and need to deploy it for inference on mobile devices, embedded systems, or any platform where ncnn runs. This is the primary and recommended path for all PyTorch-to-ncnn deployments.

Execution Steps

Step 1: Export PyTorch Model to TorchScript

Serialize the trained PyTorch model into TorchScript format using torch.jit.trace (or torch.jit.script for models with control flow). The model must be set to evaluation mode before tracing. A dummy input tensor with the correct shape is required for tracing.

Key considerations:

Set model to eval() mode before export to disable dropout and batch normalization training behavior
Use torch.jit.trace for models with static control flow; use torch.jit.script for dynamic control flow
The dummy input shape determines the default input dimensions

Step 2: Convert TorchScript to ncnn via PNNX

Run the PNNX converter on the exported TorchScript file. PNNX parses the TorchScript graph, applies operator fusion and optimization passes, and produces ncnn-compatible .param (graph definition) and .bin (model weights) files. PNNX preserves high-level PyTorch semantics and generates cleaner graphs than the traditional ONNX-based pipeline.

Key considerations:

Install PNNX via pip install pnnx or use the standalone binary
Specify inputshape to define the expected input dimensions
Use inputshape2 for dynamic shape support (second calibration shape)
Alternatively, use pnnx.export() directly in Python for a single-step conversion
PNNX also generates a Python inference script and ONNX file as bonus outputs

Step 3: Optimize the ncnn Model (Optional)

Run ncnnoptimize on the generated ncnn model to apply graph-level optimizations. This fuses operators (e.g., Convolution+BatchNorm, Convolution+ReLU), eliminates no-op layers, and converts weights to fp16 for reduced model size.

Key considerations:

If the model was converted via PNNX, most optimizations are already applied; this step is optional
Pass 65536 as the last argument to convert weights to fp16 storage
Pass 0 to keep fp32 weights
The tool produces a new optimized .param/.bin pair

Step 4: Load Model into ncnn Runtime

Instantiate an ncnn::Net object and load the optimized .param and .bin files. Configure inference options such as thread count, Vulkan GPU compute, and precision settings before loading the model.

Key considerations:

Set net.opt options before calling load_param and load_model
For production deployment, use ncnn2mem to embed models as C arrays and load from memory
Binary param format (load_param_bin) hides network architecture strings for distribution

Step 5: Preprocess Input Data

Convert raw input data (typically images) into ncnn::Mat format. Apply the same normalization (mean subtraction and scaling) that was used during model training. Use ncnn's built-in pixel conversion and resize functions for efficient preprocessing.

Key considerations:

Use ncnn::Mat::from_pixels or from_pixels_resize for image inputs
Specify the correct pixel format (PIXEL_BGR, PIXEL_RGB, PIXEL_GRAY, etc.)
Apply substract_mean_normalize with the same mean and norm values used during training
These pixel operations are SIMD-optimized within ncnn

Step 6: Execute Inference and Extract Results

Create an ncnn::Extractor from the loaded Net, feed input data by blob name, and extract output tensors. The Extractor manages the computation graph execution and memory allocation for the forward pass.

Key considerations:

Use ex.input("blob_name", mat) to set input data
Use ex.extract("output_blob_name", out) to retrieve output tensors
Blob names correspond to the layer names in the .param file
Multiple outputs can be extracted from a single forward pass
The Extractor is lightweight and can be created per-inference

Execution Diagram

GitHub URL

Workflow Repository