Principle:Tencent Ncnn Graph Optimization
| Knowledge Sources | |
|---|---|
| Domains | Model_Optimization, Model_Deployment |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Process of applying graph-level transformations to a neural network model to reduce computational cost and memory usage without changing the model's mathematical behavior.
Description
Graph optimization performs structural transformations on the network DAG to improve inference efficiency. Key transformations include:
- Operator fusion: Merging adjacent operators into a single kernel (e.g., Convolution + BatchNorm, Convolution + ReLU, Convolution + BatchNorm + ReLU)
- Dead code elimination: Removing unreachable layers and unused blobs
- Shape inference: Pre-computing tensor shapes to avoid runtime overhead
- Storage optimization: Converting weights to fp16 for reduced model size and memory bandwidth
These transformations are mathematically equivalent — the optimized model produces the same outputs as the original for any given input.
Usage
Use graph optimization after initial model conversion (via PNNX or other converters) and before deployment or quantization. It is a prerequisite for post-training quantization (the ncnn2table tool requires an optimized model).
Theoretical Basis
BatchNorm fusion into Convolution:
BatchNorm applies:
This can be folded into the preceding convolution's weights and bias:
Activation fusion: When a convolution is immediately followed by ReLU, the activation can be applied within the convolution kernel, saving one memory pass.