Principle:Tencent Ncnn Oriented Bounding Box Decoding

Knowledge Sources	Tencent_Ncnn
Domains	Computer_Vision, Object_Detection
Last Updated	2026-02-09 19:00 GMT

Overview

Decoding rotated bounding boxes with an angular component from neural network output tensors for detecting objects at arbitrary orientations.

Description

Oriented bounding box (OBB) decoding extends standard axis-aligned bounding box detection to handle objects that appear at arbitrary rotation angles in the image. This is essential for domains where objects are not aligned with the image axes: aerial/satellite imagery (vehicles, ships, buildings viewed from above), document analysis (rotated text regions), and industrial inspection (parts at arbitrary orientations on a conveyor belt).

The network produces two output tensors. The first contains per-anchor detection information in the standard format: DFL bounding box regression (4 x 16 bins) and per-class confidence scores. The second tensor contains a single angular value per anchor that represents the rotation angle of the bounding box. The angular prediction is typically output as a raw value that, after appropriate processing, represents the rotation in radians within the range [0, pi/2) or [-pi/4, 3*pi/4).

Post-processing proceeds as follows: DFL bins are decoded into axis-aligned box coordinates, confidence thresholding filters low-scoring detections, and the angular value is combined with the box center and dimensions to form a rotated rectangle (center_x, center_y, width, height, angle). NMS for oriented boxes requires computing the intersection area of two rotated rectangles, which is more complex than axis-aligned IoU -- it involves finding the polygon of intersection between two rotated rectangles and computing its area.

The angle representation in YOLO-OBB models uses the angular output scaled to [0, pi/2) and then adjusted by subtracting pi/4, producing an effective range of [-pi/4, pi/4). Width and height may be swapped when the angle exceeds certain thresholds to maintain a canonical representation.

Usage

Apply this principle for detection tasks where objects have significant rotational variation and axis-aligned boxes would include excessive background. Common applications include remote sensing, scene text detection, face detection at extreme head poses, and any manufacturing or logistics scenario with arbitrarily oriented parts.

Theoretical Basis

Oriented bounding box representation: $OBB = (c_{x}, c_{y}, w, h, θ)$

where $(c_{x}, c_{y})$ is the box center, $(w, h)$ are width and height, and $θ$ is the rotation angle in radians.

Output tensor layout (YOLO11-OBB):

Detection tensor: [N_boxes, 79]   (e.g., 15 classes)
  Columns  0..63 : bbox DFL regression (4 sides x 16 bins)
  Columns 64..78 : per-class confidence scores

Angle tensor: [N_boxes, 1]
  Column 0 : raw angular prediction

Angle decoding:

angle_raw = sigmoid(angle_output) * (pi / 2)  -- map to [0, pi/2)
angle = angle_raw - (pi / 4)                  -- shift to [-pi/4, pi/4)

Rotated rectangle IoU for NMS:

1. Compute the 4 corner vertices of each rotated rectangle:
   corners = rotate( [(-w/2,-h/2), (w/2,-h/2),
                      (w/2,h/2), (-w/2,h/2)], theta ) + (cx, cy)
2. Find intersection polygon using Sutherland-Hodgman clipping
3. Compute intersection area via shoelace formula
4. IoU = intersection_area / (area_A + area_B - intersection_area)

Corner vertex computation: $[\begin{matrix} {x_{i}}^{'} \\ {y_{i}}^{'} \end{matrix}] = [\begin{matrix} \cos θ & - \sin θ \\ \sin θ & \cos θ \end{matrix}] [\begin{matrix} x_{i} \\ y_{i} \end{matrix}] + [\begin{matrix} c_{x} \\ c_{y} \end{matrix}]$

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment