Principle:Tencent Ncnn Bounding Box Decoding
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Object_Detection |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Algorithm for converting raw neural network detection output tensors into spatial bounding box coordinates with class predictions, using Distribution Focal Loss (DFL) decoding for anchor-free detectors.
Description
Modern anchor-free object detectors (YOLOv8, YOLO11, NanoDet-Plus) encode bounding box predictions as distributions over discrete offset bins rather than direct coordinate regression. This approach, called Distribution Focal Loss (DFL), represents each box edge distance as a probability distribution over 16 bins (reg_max=16), then computes the expected value as the decoded distance.
The decoding process involves: (1) applying sigmoid activation to class scores, (2) filtering by confidence threshold, (3) computing DFL softmax over 16 bins per box edge, (4) computing expected distance values, and (5) converting grid-relative offsets to absolute pixel coordinates using stride-specific grid positions.
Older anchor-based detectors (YOLOv5) use a simpler decoding with predefined anchor boxes, where predictions are offsets relative to anchor positions.
Usage
Use DFL decoding for anchor-free detectors (YOLOv8, YOLO11, NanoDet-Plus). Use anchor-based decoding for older models (YOLOv5, YOLOv3). The output tensor format determines which decoding method to apply.
Theoretical Basis
DFL (Distribution Focal Loss) decoding:
For each detection at grid position (x, y) with stride s:
Then convert to box coordinates:
Pseudo-code:
// Abstract DFL decode algorithm
for each grid_position (gx, gy) at each stride:
class_scores = sigmoid(raw_classes)
if max(class_scores) < threshold: continue
for each edge in {left, top, right, bottom}:
bins = raw_pred[edge * 16 : (edge+1) * 16]
distance = sum(softmax(bins) * range(16))
x0 = (gx + 0.5 - d_left) * stride
y0 = (gy + 0.5 - d_top) * stride
x1 = (gx + 0.5 + d_right) * stride
y1 = (gy + 0.5 + d_bottom) * stride