Principle:Tencent Ncnn Calibration Dataset Preparation
| Knowledge Sources | |
|---|---|
| Domains | Quantization, Model_Optimization |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Process of assembling a representative dataset of input samples used to calibrate activation ranges for post-training quantization.
Description
Post-training quantization requires a calibration dataset — a set of representative inputs that captures the typical activation distributions of the model. The calibration tool runs forward passes with these samples to collect per-layer activation statistics, which are then used to compute optimal quantization scale factors.
The calibration dataset should be a subset (typically 100-1000 samples) of the training or validation dataset. It must be representative of the data distribution the model will encounter in production. Using too few samples risks poor calibration; using too many wastes computation without improving accuracy.
For image models, samples are provided as image files listed in a text file. For non-image models (e.g., NLP), samples can be provided as NumPy .npy files.
Usage
Use as the preparation step before generating a calibration table with ncnn2table. Select samples that cover the diversity of inputs the model will encounter (different lighting, scales, orientations for vision models).
Theoretical Basis
Calibration dataset requirements:
Dataset selection criteria:
1. Representative: covers the model's expected input distribution
2. Diverse: includes edge cases and variety
3. Sized appropriately: 100-1000 samples (diminishing returns beyond)
4. Matching preprocessing: same normalization as inference
File format:
# imagelist.txt — one image path per line
images/sample_001.jpg
images/sample_002.jpg
...
images/sample_500.jpg