Heuristic:Tencent Ncnn Letterbox Vs Direct Resize
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Preprocessing |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Preprocessing strategy guide for choosing between letterbox (aspect-ratio-preserving) resize and direct resize based on the target model architecture.
Description
Image preprocessing for neural network inference requires resizing input images to the model's expected input dimensions. There are two approaches: direct resize (stretching to exact dimensions, potentially distorting aspect ratio) and letterbox resize (scaling to fit within dimensions while preserving aspect ratio, then padding the remaining area). The choice depends entirely on how the training data was preprocessed. Classification models (SqueezeNet, ResNet, MobileNet) typically use direct resize. Detection models (YOLO family, NanoDet) typically use letterbox resize because bounding box coordinates must account for aspect ratio preservation.
Usage
Use this heuristic when implementing image preprocessing for ncnn inference. Always match the preprocessing to the original training pipeline. Using the wrong resize method will produce incorrect results even with a correctly converted model.
The Insight (Rule of Thumb)
- Action: For classification models, use `Mat::from_pixels_resize()` (direct resize). For YOLO/anchor-free detection models, use letterbox resize with 114-valued padding.
- Value: Letterbox pad value = 114 (standard for YOLO family). Scale factor must be tracked for decoding bounding box coordinates back to original image space.
- Trade-off: Direct resize is simpler and faster but distorts objects. Letterbox preserves aspect ratio but requires coordinate rescaling during post-processing.
- Key Insight: Interpolation differences between frameworks (OpenCV vs ncnn vs PIL) can cause subtle accuracy drops. For exact validation, use BMP images at the exact input size.
Reasoning
Detection models predict bounding box coordinates relative to the input image. If the input is distorted by non-uniform scaling, the predicted boxes will be distorted too, requiring complex correction. Letterbox resize avoids this by ensuring uniform scaling. The padding value (114 for YOLO, 0 for some other models) fills the empty space with a neutral value. Classification models only predict class probabilities, so aspect ratio distortion has minimal impact on accuracy. The ncnn examples demonstrate both patterns: `squeezenet.cpp` uses `from_pixels_resize` (direct), while `yolov8.cpp` implements custom letterbox logic.
Letterbox implementation pattern from `examples/yolov8.cpp:298-338`:
// Calculate scale to fit within target size preserving aspect ratio
int w = img.cols;
int h = img.rows;
float scale = std::min((float)target_size / w, (float)target_size / h);
int wpad = target_size - int(w * scale);
int hpad = target_size - int(h * scale);
// Resize with preserved aspect ratio, then pad
ncnn::Mat in = ncnn::Mat::from_pixels_resize(img.data, ncnn::Mat::PIXEL_BGR,
w, h, int(w * scale), int(h * scale));
ncnn::copy_make_border(in, in_pad, hpad / 2, hpad - hpad / 2,
wpad / 2, wpad - wpad / 2,
ncnn::BORDER_CONSTANT, 114.f);
Direct resize pattern from `examples/squeezenet.cpp:30-34`:
ncnn::Mat in = ncnn::Mat::from_pixels_resize(img.data, ncnn::Mat::PIXEL_BGR,
img.cols, img.rows, 227, 227);