Principle:Tencent Ncnn Graph Optimization

Knowledge Sources	ncnn ncnnoptimize Guide
Domains	Model_Optimization, Model_Deployment
Last Updated	2026-02-09 00:00 GMT

Overview

Process of applying graph-level transformations to a neural network model to reduce computational cost and memory usage without changing the model's mathematical behavior.

Description

Graph optimization performs structural transformations on the network DAG to improve inference efficiency. Key transformations include:

Operator fusion: Merging adjacent operators into a single kernel (e.g., Convolution + BatchNorm, Convolution + ReLU, Convolution + BatchNorm + ReLU)
Dead code elimination: Removing unreachable layers and unused blobs
Shape inference: Pre-computing tensor shapes to avoid runtime overhead
Storage optimization: Converting weights to fp16 for reduced model size and memory bandwidth

These transformations are mathematically equivalent — the optimized model produces the same outputs as the original for any given input.

Usage

Use graph optimization after initial model conversion (via PNNX or other converters) and before deployment or quantization. It is a prerequisite for post-training quantization (the ncnn2table tool requires an optimized model).

Theoretical Basis

BatchNorm fusion into Convolution:

BatchNorm applies: $y = γ \frac{x - μ}{\sqrt{σ^{2} + ϵ}} + β$

This can be folded into the preceding convolution's weights and bias: $W^{'} = \frac{γ}{\sqrt{σ^{2} + ϵ}} W$ $b^{'} = \frac{γ}{\sqrt{σ^{2} + ϵ}} (b - μ) + β$

Activation fusion: When a convolution is immediately followed by ReLU, the activation can be applied within the convolution kernel, saving one memory pass.

Related Pages

Implemented By

Implementation:Tencent_Ncnn_Ncnnoptimize

Uses Heuristic

Heuristic:Tencent_Ncnn_Optimize_Before_Quantize

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment