Workflow:Tencent Ncnn Object Detection Inference

Knowledge Sources	ncnn ncnn Examples PNNX Documentation
Domains	Object_Detection, Inference, Computer_Vision
Last Updated	2026-02-09 19:00 GMT

Overview

End-to-end process for running object detection inference using ncnn with YOLO-family models (YOLOv5, YOLOv7, YOLOv8, YOLO11) and other detection architectures.

Description

This workflow covers running object detection models through the ncnn inference framework. It handles the complete pipeline from loading a detection model, preprocessing input images with letterbox padding to maintain aspect ratio, executing multi-output inference to extract feature maps at different scales, decoding bounding box predictions from anchor-based or anchor-free detection heads, applying Non-Maximum Suppression (NMS) to filter overlapping detections, and rendering the final detection results.

Key outcomes:

A list of detected objects with bounding box coordinates, class labels, and confidence scores
Filtered results after NMS with configurable confidence and IoU thresholds

Usage

Execute this workflow when you have a detection model (such as YOLOv5, YOLOv8, NanoDet, SSD, or similar) converted to ncnn format and need to detect objects in images or video frames on edge devices.

Execution Steps

Step 1: Load the Detection Model

Create an ncnn::Net instance and load the detection model's .param and .bin files. For models requiring custom layers (e.g., YOLOv5 Focus layer in older versions), register custom layer implementations before loading the model.

Key considerations:

Some YOLO versions require custom layer registration via register_custom_layer
PNNX-converted models generally do not need custom layers
Set net.opt for thread count and Vulkan compute before loading

Step 2: Preprocess Input Image with Letterbox Padding

Resize the input image to the model's expected input size while maintaining aspect ratio by adding padding (letterboxing). Convert the padded image to an ncnn::Mat and apply mean subtraction and normalization. The scale factor and padding offsets must be tracked for later coordinate mapping.

Key considerations:

Common input sizes are 640x640 (YOLO) or 320x320 (NanoDet)
Letterbox padding prevents distortion from non-square images
Normalization is typically 1/255.0 for YOLO models (pixel values to 0-1 range)
Record the scale ratio and padding offset for decoding output coordinates back to original image space

Step 3: Execute Multi-Scale Detection Inference

Run the forward pass through the detection model. Detection models typically produce outputs at multiple scales (e.g., stride 8, 16, 32 feature maps) that capture objects of different sizes. Extract all output blobs from the network.

Key considerations:

PNNX-converted models may produce a single concatenated output tensor
Older conversion paths may require extracting separate output blobs per scale
Anchor-based models (YOLOv5, YOLOv7) use predefined anchor sizes per scale
Anchor-free models (YOLOv8, YOLO11, NanoDet) use DFL (Distribution Focal Loss) for box regression

Step 4: Decode Bounding Box Predictions

Parse the raw network output tensors to extract bounding box coordinates and class scores. For anchor-based models, apply anchor offsets and scaling. For anchor-free models, decode the DFL distribution into box coordinates. Apply sigmoid activation to class scores and filter proposals by a confidence threshold.

Key considerations:

Box format varies: some models output center-x/y/w/h, others output corner coordinates
DFL decoding involves computing expected values from discrete probability distributions over regression bins
Apply confidence threshold early to reduce the number of candidates for NMS
Map coordinates back to original image space using the letterbox scale and padding offsets

Step 5: Apply Non-Maximum Suppression

Filter overlapping detections using Non-Maximum Suppression (NMS). Sort candidates by confidence, then iteratively suppress detections that have high IoU (Intersection over Union) overlap with higher-confidence detections of the same class.

Key considerations:

Typical NMS IoU threshold is 0.45-0.65
Typical confidence threshold is 0.25-0.5
Class-aware NMS treats each class independently
The output is the final set of non-overlapping detections

Step 6: Render Detection Results

Draw bounding boxes, class labels, and confidence scores on the original image. Map the detection coordinates back to the original image dimensions, accounting for any letterbox padding applied during preprocessing.

Key considerations:

Use OpenCV or ncnn's built-in drawing functions for visualization
ncnn provides ncnn::draw_rectangle_c3, ncnn::draw_text_c3 and similar functions
Color-code different object classes for visual clarity

Execution Diagram

GitHub URL

Workflow Repository