Implementation:Tencent Ncnn Ncnn2int8

Knowledge Sources	ncnn ncnn Quantization Guide
Domains	Quantization, Model_Optimization
Last Updated	2026-02-09 00:00 GMT

Overview

External CLI tool for applying int8 quantization to an ncnn model using a pre-computed calibration table.

Description

ncnn2int8 reads an optimized float32 ncnn model and a calibration table, then produces a quantized model with int8 weights for all supported layers. It is implemented as the NetQuantize class (extending ModelWriter) and processes each layer to determine if it can be quantized based on the calibration table entries.

The tool preserves the model's network structure while converting weight storage from float32 to int8. Non-quantizable layers are stored in fp16 for reduced size. The quantized model is directly loadable by ncnn::Net with no API changes — the runtime automatically selects the int8 execution path.

Usage

Use after generating the calibration table with ncnn2table. The input model must have been optimized with ncnnoptimize first.

Code Reference

Source Location

Repository: ncnn
File: tools/quantize/ncnn2int8.cpp
Lines: L108 (class NetQuantize : public ModelWriter), L1068-1123 (main function)

Signature

ncnn2int8 [inparam] [inbin] [outparam] [outbin] [calibration_table]

Import

# Built from ncnn source tree
# Located in build/tools/quantize/ncnn2int8 after cmake build

I/O Contract

Inputs

Name	Type	Required	Description
inparam	file path	Yes	Optimized .param file (from ncnnoptimize)
inbin	file path	Yes	Optimized .bin file
calibration_table	file path	Yes	.table file from ncnn2table

Outputs

Name	Type	Description
outparam	File	Quantized .param with int8 layer annotations
outbin	File	Quantized .bin with int8 weights (~4x smaller than fp32)

Usage Examples

Apply Quantization

ncnn2int8 \
    model-opt.ncnn.param \
    model-opt.ncnn.bin \
    model-int8.ncnn.param \
    model-int8.ncnn.bin \
    model.table

Complete Quantization Pipeline

# Step 1: Optimize (prerequisite)
ncnnoptimize model.param model.bin model-opt.param model-opt.bin 0

# Step 2: Prepare calibration data
find calibration_images/ -type f > imagelist.txt

# Step 3: Generate calibration table
ncnn2table model-opt.param model-opt.bin \
    imagelist.txt model.table \
    mean=[104,117,123] norm=[1,1,1] \
    shape=[227,227,3] pixel=BGR thread=4

# Step 4: Apply quantization
ncnn2int8 model-opt.param model-opt.bin \
    model-int8.param model-int8.bin model.table

# Step 5: Use in inference (same API as float32)
# ncnn::Net net;
# net.load_param("model-int8.param");
# net.load_model("model-int8.bin");
# // opt.use_int8_inference is true by default

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment