Principle:Tencent Ncnn Calibration Dataset Preparation

Knowledge Sources	ncnn ncnn Quantization Guide
Domains	Quantization, Model_Optimization
Last Updated	2026-02-09 00:00 GMT

Overview

Process of assembling a representative dataset of input samples used to calibrate activation ranges for post-training quantization.

Description

Post-training quantization requires a calibration dataset — a set of representative inputs that captures the typical activation distributions of the model. The calibration tool runs forward passes with these samples to collect per-layer activation statistics, which are then used to compute optimal quantization scale factors.

The calibration dataset should be a subset (typically 100-1000 samples) of the training or validation dataset. It must be representative of the data distribution the model will encounter in production. Using too few samples risks poor calibration; using too many wastes computation without improving accuracy.

For image models, samples are provided as image files listed in a text file. For non-image models (e.g., NLP), samples can be provided as NumPy .npy files.

Usage

Use as the preparation step before generating a calibration table with ncnn2table. Select samples that cover the diversity of inputs the model will encounter (different lighting, scales, orientations for vision models).

Theoretical Basis

Calibration dataset requirements:

Dataset selection criteria:
1. Representative: covers the model's expected input distribution
2. Diverse: includes edge cases and variety
3. Sized appropriately: 100-1000 samples (diminishing returns beyond)
4. Matching preprocessing: same normalization as inference

File format:

# imagelist.txt — one image path per line
images/sample_001.jpg
images/sample_002.jpg
...
images/sample_500.jpg

Related Pages

Implemented By

Implementation:Tencent_Ncnn_Calibration_Image_List

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment