Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Tencent Ncnn Letterbox Vs Direct Resize

From Leeroopedia



Knowledge Sources
Domains Computer_Vision, Preprocessing
Last Updated 2026-02-09 19:00 GMT

Overview

Preprocessing strategy guide for choosing between letterbox (aspect-ratio-preserving) resize and direct resize based on the target model architecture.

Description

Image preprocessing for neural network inference requires resizing input images to the model's expected input dimensions. There are two approaches: direct resize (stretching to exact dimensions, potentially distorting aspect ratio) and letterbox resize (scaling to fit within dimensions while preserving aspect ratio, then padding the remaining area). The choice depends entirely on how the training data was preprocessed. Classification models (SqueezeNet, ResNet, MobileNet) typically use direct resize. Detection models (YOLO family, NanoDet) typically use letterbox resize because bounding box coordinates must account for aspect ratio preservation.

Usage

Use this heuristic when implementing image preprocessing for ncnn inference. Always match the preprocessing to the original training pipeline. Using the wrong resize method will produce incorrect results even with a correctly converted model.

The Insight (Rule of Thumb)

  • Action: For classification models, use `Mat::from_pixels_resize()` (direct resize). For YOLO/anchor-free detection models, use letterbox resize with 114-valued padding.
  • Value: Letterbox pad value = 114 (standard for YOLO family). Scale factor must be tracked for decoding bounding box coordinates back to original image space.
  • Trade-off: Direct resize is simpler and faster but distorts objects. Letterbox preserves aspect ratio but requires coordinate rescaling during post-processing.
  • Key Insight: Interpolation differences between frameworks (OpenCV vs ncnn vs PIL) can cause subtle accuracy drops. For exact validation, use BMP images at the exact input size.

Reasoning

Detection models predict bounding box coordinates relative to the input image. If the input is distorted by non-uniform scaling, the predicted boxes will be distorted too, requiring complex correction. Letterbox resize avoids this by ensuring uniform scaling. The padding value (114 for YOLO, 0 for some other models) fills the empty space with a neutral value. Classification models only predict class probabilities, so aspect ratio distortion has minimal impact on accuracy. The ncnn examples demonstrate both patterns: `squeezenet.cpp` uses `from_pixels_resize` (direct), while `yolov8.cpp` implements custom letterbox logic.

Letterbox implementation pattern from `examples/yolov8.cpp:298-338`:

// Calculate scale to fit within target size preserving aspect ratio
int w = img.cols;
int h = img.rows;
float scale = std::min((float)target_size / w, (float)target_size / h);
int wpad = target_size - int(w * scale);
int hpad = target_size - int(h * scale);

// Resize with preserved aspect ratio, then pad
ncnn::Mat in = ncnn::Mat::from_pixels_resize(img.data, ncnn::Mat::PIXEL_BGR,
    w, h, int(w * scale), int(h * scale));
ncnn::copy_make_border(in, in_pad, hpad / 2, hpad - hpad / 2,
    wpad / 2, wpad - wpad / 2,
    ncnn::BORDER_CONSTANT, 114.f);

Direct resize pattern from `examples/squeezenet.cpp:30-34`:

ncnn::Mat in = ncnn::Mat::from_pixels_resize(img.data, ncnn::Mat::PIXEL_BGR,
    img.cols, img.rows, 227, 227);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment