Workflow:Tencent Ncnn Object Detection Inference
| Knowledge Sources | |
|---|---|
| Domains | Object_Detection, Inference, Computer_Vision |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
End-to-end process for running object detection inference using ncnn with YOLO-family models (YOLOv5, YOLOv7, YOLOv8, YOLO11) and other detection architectures.
Description
This workflow covers running object detection models through the ncnn inference framework. It handles the complete pipeline from loading a detection model, preprocessing input images with letterbox padding to maintain aspect ratio, executing multi-output inference to extract feature maps at different scales, decoding bounding box predictions from anchor-based or anchor-free detection heads, applying Non-Maximum Suppression (NMS) to filter overlapping detections, and rendering the final detection results.
Key outcomes:
- A list of detected objects with bounding box coordinates, class labels, and confidence scores
- Filtered results after NMS with configurable confidence and IoU thresholds
Usage
Execute this workflow when you have a detection model (such as YOLOv5, YOLOv8, NanoDet, SSD, or similar) converted to ncnn format and need to detect objects in images or video frames on edge devices.
Execution Steps
Step 1: Load the Detection Model
Create an ncnn::Net instance and load the detection model's .param and .bin files. For models requiring custom layers (e.g., YOLOv5 Focus layer in older versions), register custom layer implementations before loading the model.
Key considerations:
- Some YOLO versions require custom layer registration via register_custom_layer
- PNNX-converted models generally do not need custom layers
- Set net.opt for thread count and Vulkan compute before loading
Step 2: Preprocess Input Image with Letterbox Padding
Resize the input image to the model's expected input size while maintaining aspect ratio by adding padding (letterboxing). Convert the padded image to an ncnn::Mat and apply mean subtraction and normalization. The scale factor and padding offsets must be tracked for later coordinate mapping.
Key considerations:
- Common input sizes are 640x640 (YOLO) or 320x320 (NanoDet)
- Letterbox padding prevents distortion from non-square images
- Normalization is typically 1/255.0 for YOLO models (pixel values to 0-1 range)
- Record the scale ratio and padding offset for decoding output coordinates back to original image space
Step 3: Execute Multi-Scale Detection Inference
Run the forward pass through the detection model. Detection models typically produce outputs at multiple scales (e.g., stride 8, 16, 32 feature maps) that capture objects of different sizes. Extract all output blobs from the network.
Key considerations:
- PNNX-converted models may produce a single concatenated output tensor
- Older conversion paths may require extracting separate output blobs per scale
- Anchor-based models (YOLOv5, YOLOv7) use predefined anchor sizes per scale
- Anchor-free models (YOLOv8, YOLO11, NanoDet) use DFL (Distribution Focal Loss) for box regression
Step 4: Decode Bounding Box Predictions
Parse the raw network output tensors to extract bounding box coordinates and class scores. For anchor-based models, apply anchor offsets and scaling. For anchor-free models, decode the DFL distribution into box coordinates. Apply sigmoid activation to class scores and filter proposals by a confidence threshold.
Key considerations:
- Box format varies: some models output center-x/y/w/h, others output corner coordinates
- DFL decoding involves computing expected values from discrete probability distributions over regression bins
- Apply confidence threshold early to reduce the number of candidates for NMS
- Map coordinates back to original image space using the letterbox scale and padding offsets
Step 5: Apply Non-Maximum Suppression
Filter overlapping detections using Non-Maximum Suppression (NMS). Sort candidates by confidence, then iteratively suppress detections that have high IoU (Intersection over Union) overlap with higher-confidence detections of the same class.
Key considerations:
- Typical NMS IoU threshold is 0.45-0.65
- Typical confidence threshold is 0.25-0.5
- Class-aware NMS treats each class independently
- The output is the final set of non-overlapping detections
Step 6: Render Detection Results
Draw bounding boxes, class labels, and confidence scores on the original image. Map the detection coordinates back to the original image dimensions, accounting for any letterbox padding applied during preprocessing.
Key considerations:
- Use OpenCV or ncnn's built-in drawing functions for visualization
- ncnn provides ncnn::draw_rectangle_c3, ncnn::draw_text_c3 and similar functions
- Color-code different object classes for visual clarity