Implementation:Tencent Ncnn Ncnn2int8
| Knowledge Sources | |
|---|---|
| Domains | Quantization, Model_Optimization |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
External CLI tool for applying int8 quantization to an ncnn model using a pre-computed calibration table.
Description
ncnn2int8 reads an optimized float32 ncnn model and a calibration table, then produces a quantized model with int8 weights for all supported layers. It is implemented as the NetQuantize class (extending ModelWriter) and processes each layer to determine if it can be quantized based on the calibration table entries.
The tool preserves the model's network structure while converting weight storage from float32 to int8. Non-quantizable layers are stored in fp16 for reduced size. The quantized model is directly loadable by ncnn::Net with no API changes — the runtime automatically selects the int8 execution path.
Usage
Use after generating the calibration table with ncnn2table. The input model must have been optimized with ncnnoptimize first.
Code Reference
Source Location
- Repository: ncnn
- File: tools/quantize/ncnn2int8.cpp
- Lines: L108 (class NetQuantize : public ModelWriter), L1068-1123 (main function)
Signature
ncnn2int8 [inparam] [inbin] [outparam] [outbin] [calibration_table]
Import
# Built from ncnn source tree
# Located in build/tools/quantize/ncnn2int8 after cmake build
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| inparam | file path | Yes | Optimized .param file (from ncnnoptimize) |
| inbin | file path | Yes | Optimized .bin file |
| calibration_table | file path | Yes | .table file from ncnn2table |
Outputs
| Name | Type | Description |
|---|---|---|
| outparam | File | Quantized .param with int8 layer annotations |
| outbin | File | Quantized .bin with int8 weights (~4x smaller than fp32) |
Usage Examples
Apply Quantization
ncnn2int8 \
model-opt.ncnn.param \
model-opt.ncnn.bin \
model-int8.ncnn.param \
model-int8.ncnn.bin \
model.table
Complete Quantization Pipeline
# Step 1: Optimize (prerequisite)
ncnnoptimize model.param model.bin model-opt.param model-opt.bin 0
# Step 2: Prepare calibration data
find calibration_images/ -type f > imagelist.txt
# Step 3: Generate calibration table
ncnn2table model-opt.param model-opt.bin \
imagelist.txt model.table \
mean=[104,117,123] norm=[1,1,1] \
shape=[227,227,3] pixel=BGR thread=4
# Step 4: Apply quantization
ncnn2int8 model-opt.param model-opt.bin \
model-int8.param model-int8.bin model.table
# Step 5: Use in inference (same API as float32)
# ncnn::Net net;
# net.load_param("model-int8.param");
# net.load_model("model-int8.bin");
# // opt.use_int8_inference is true by default