Principle:Tencent Ncnn Bounding Box Decoding

Knowledge Sources	ncnn Generalized Focal Loss V2 YOLOv8
Domains	Computer_Vision, Object_Detection
Last Updated	2026-02-09 00:00 GMT

Overview

Algorithm for converting raw neural network detection output tensors into spatial bounding box coordinates with class predictions, using Distribution Focal Loss (DFL) decoding for anchor-free detectors.

Description

Modern anchor-free object detectors (YOLOv8, YOLO11, NanoDet-Plus) encode bounding box predictions as distributions over discrete offset bins rather than direct coordinate regression. This approach, called Distribution Focal Loss (DFL), represents each box edge distance as a probability distribution over 16 bins (reg_max=16), then computes the expected value as the decoded distance.

The decoding process involves: (1) applying sigmoid activation to class scores, (2) filtering by confidence threshold, (3) computing DFL softmax over 16 bins per box edge, (4) computing expected distance values, and (5) converting grid-relative offsets to absolute pixel coordinates using stride-specific grid positions.

Older anchor-based detectors (YOLOv5) use a simpler decoding with predefined anchor boxes, where predictions are offsets relative to anchor positions.

Usage

Use DFL decoding for anchor-free detectors (YOLOv8, YOLO11, NanoDet-Plus). Use anchor-based decoding for older models (YOLOv5, YOLOv3). The output tensor format determines which decoding method to apply.

Theoretical Basis

DFL (Distribution Focal Loss) decoding:

For each detection at grid position (x, y) with stride s:

$d_{i} = \sum_{j = 0}^{15} j \cdot softmax ({pred}_{i, 0 : 16}) [j], i \in {l e f t, t o p, r i g h t, b o t t o m}$

Then convert to box coordinates: $x_{0} = (x + 0.5 - d_{l e f t}) \times s$ $y_{0} = (y + 0.5 - d_{t o p}) \times s$ $x_{1} = (x + 0.5 + d_{r i g h t}) \times s$ $y_{1} = (y + 0.5 + d_{b o t t o m}) \times s$

Pseudo-code:

// Abstract DFL decode algorithm
for each grid_position (gx, gy) at each stride:
    class_scores = sigmoid(raw_classes)
    if max(class_scores) < threshold: continue

    for each edge in {left, top, right, bottom}:
        bins = raw_pred[edge * 16 : (edge+1) * 16]
        distance = sum(softmax(bins) * range(16))

    x0 = (gx + 0.5 - d_left) * stride
    y0 = (gy + 0.5 - d_top) * stride
    x1 = (gx + 0.5 + d_right) * stride
    y1 = (gy + 0.5 + d_bottom) * stride

Related Pages

Implemented By

Implementation:Tencent_Ncnn_DFL_Anchor_Free_Decode

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment