Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Tencent Ncnn Ncnn2int8

From Leeroopedia


Knowledge Sources
Domains Quantization, Model_Optimization
Last Updated 2026-02-09 00:00 GMT

Overview

External CLI tool for applying int8 quantization to an ncnn model using a pre-computed calibration table.

Description

ncnn2int8 reads an optimized float32 ncnn model and a calibration table, then produces a quantized model with int8 weights for all supported layers. It is implemented as the NetQuantize class (extending ModelWriter) and processes each layer to determine if it can be quantized based on the calibration table entries.

The tool preserves the model's network structure while converting weight storage from float32 to int8. Non-quantizable layers are stored in fp16 for reduced size. The quantized model is directly loadable by ncnn::Net with no API changes — the runtime automatically selects the int8 execution path.

Usage

Use after generating the calibration table with ncnn2table. The input model must have been optimized with ncnnoptimize first.

Code Reference

Source Location

  • Repository: ncnn
  • File: tools/quantize/ncnn2int8.cpp
  • Lines: L108 (class NetQuantize : public ModelWriter), L1068-1123 (main function)

Signature

ncnn2int8 [inparam] [inbin] [outparam] [outbin] [calibration_table]

Import

# Built from ncnn source tree
# Located in build/tools/quantize/ncnn2int8 after cmake build

I/O Contract

Inputs

Name Type Required Description
inparam file path Yes Optimized .param file (from ncnnoptimize)
inbin file path Yes Optimized .bin file
calibration_table file path Yes .table file from ncnn2table

Outputs

Name Type Description
outparam File Quantized .param with int8 layer annotations
outbin File Quantized .bin with int8 weights (~4x smaller than fp32)

Usage Examples

Apply Quantization

ncnn2int8 \
    model-opt.ncnn.param \
    model-opt.ncnn.bin \
    model-int8.ncnn.param \
    model-int8.ncnn.bin \
    model.table

Complete Quantization Pipeline

# Step 1: Optimize (prerequisite)
ncnnoptimize model.param model.bin model-opt.param model-opt.bin 0

# Step 2: Prepare calibration data
find calibration_images/ -type f > imagelist.txt

# Step 3: Generate calibration table
ncnn2table model-opt.param model-opt.bin \
    imagelist.txt model.table \
    mean=[104,117,123] norm=[1,1,1] \
    shape=[227,227,3] pixel=BGR thread=4

# Step 4: Apply quantization
ncnn2int8 model-opt.param model-opt.bin \
    model-int8.param model-int8.bin model.table

# Step 5: Use in inference (same API as float32)
# ncnn::Net net;
# net.load_param("model-int8.param");
# net.load_model("model-int8.bin");
# // opt.use_int8_inference is true by default

Related Pages

Implements Principle

Uses Heuristic

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment