Principle:Tencent Ncnn Oriented Bounding Box Decoding
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Object_Detection |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Decoding rotated bounding boxes with an angular component from neural network output tensors for detecting objects at arbitrary orientations.
Description
Oriented bounding box (OBB) decoding extends standard axis-aligned bounding box detection to handle objects that appear at arbitrary rotation angles in the image. This is essential for domains where objects are not aligned with the image axes: aerial/satellite imagery (vehicles, ships, buildings viewed from above), document analysis (rotated text regions), and industrial inspection (parts at arbitrary orientations on a conveyor belt).
The network produces two output tensors. The first contains per-anchor detection information in the standard format: DFL bounding box regression (4 x 16 bins) and per-class confidence scores. The second tensor contains a single angular value per anchor that represents the rotation angle of the bounding box. The angular prediction is typically output as a raw value that, after appropriate processing, represents the rotation in radians within the range [0, pi/2) or [-pi/4, 3*pi/4).
Post-processing proceeds as follows: DFL bins are decoded into axis-aligned box coordinates, confidence thresholding filters low-scoring detections, and the angular value is combined with the box center and dimensions to form a rotated rectangle (center_x, center_y, width, height, angle). NMS for oriented boxes requires computing the intersection area of two rotated rectangles, which is more complex than axis-aligned IoU -- it involves finding the polygon of intersection between two rotated rectangles and computing its area.
The angle representation in YOLO-OBB models uses the angular output scaled to [0, pi/2) and then adjusted by subtracting pi/4, producing an effective range of [-pi/4, pi/4). Width and height may be swapped when the angle exceeds certain thresholds to maintain a canonical representation.
Usage
Apply this principle for detection tasks where objects have significant rotational variation and axis-aligned boxes would include excessive background. Common applications include remote sensing, scene text detection, face detection at extreme head poses, and any manufacturing or logistics scenario with arbitrarily oriented parts.
Theoretical Basis
Oriented bounding box representation:
where is the box center, are width and height, and is the rotation angle in radians.
Output tensor layout (YOLO11-OBB):
Detection tensor: [N_boxes, 79] (e.g., 15 classes)
Columns 0..63 : bbox DFL regression (4 sides x 16 bins)
Columns 64..78 : per-class confidence scores
Angle tensor: [N_boxes, 1]
Column 0 : raw angular prediction
Angle decoding:
angle_raw = sigmoid(angle_output) * (pi / 2) -- map to [0, pi/2)
angle = angle_raw - (pi / 4) -- shift to [-pi/4, pi/4)
Rotated rectangle IoU for NMS:
1. Compute the 4 corner vertices of each rotated rectangle:
corners = rotate( [(-w/2,-h/2), (w/2,-h/2),
(w/2,h/2), (-w/2,h/2)], theta ) + (cx, cy)
2. Find intersection polygon using Sutherland-Hodgman clipping
3. Compute intersection area via shoelace formula
4. IoU = intersection_area / (area_A + area_B - intersection_area)
Corner vertex computation: