Principle:LaurentMazare Tch rs YOLO Object Detection
| Knowledge Sources | |
|---|---|
| Domains | Deep Learning, Computer Vision, Object Detection |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Single-pass object detection divides an image into a grid and simultaneously predicts bounding boxes, objectness scores, and class probabilities for all cells in one forward pass.
Description
YOLO (You Only Look Once) reformulates object detection as a single regression problem rather than the traditional two-stage approach of region proposal followed by classification. The input image is divided into an grid. Each grid cell is responsible for predicting objects whose center falls within that cell.
For each grid cell, the network predicts B bounding boxes, each consisting of:
- Center coordinates (x, y) relative to the grid cell
- Width and height (w, h) relative to the full image, often predicted as offsets from anchor boxes (pre-defined aspect ratios)
- An objectness score indicating confidence that the box contains an object
- Class probabilities for each of the C object categories
The predictions are made in a single forward pass through the network, making YOLO significantly faster than two-stage detectors. The output tensor has shape , where 5 accounts for the four box coordinates plus objectness.
Non-maximum suppression (NMS) is applied as a post-processing step to remove duplicate detections. When multiple bounding boxes overlap significantly (measured by intersection over union), only the box with the highest confidence score is retained.
Usage
Apply the YOLO detection principle when:
- Real-time object detection is required (video streams, robotics, autonomous driving)
- Speed is prioritized over maximum accuracy on small or overlapping objects
- Detecting objects across multiple scales using feature pyramid approaches
- A single unified architecture is preferred over multi-stage pipelines
Theoretical Basis
Grid-Based Prediction
The image is divided into an grid. Each cell predicts bounding boxes. Each box prediction includes:
These raw predictions are transformed using anchor box priors :
where is the top-left corner of the grid cell and is the sigmoid function.
Objectness and Class Prediction
The objectness score is:
Class probabilities are predicted per cell and combined with objectness:
Intersection over Union (IoU)
IoU measures the overlap between predicted box and ground truth box :
Non-Maximum Suppression
After prediction, NMS filters redundant boxes:
- Sort all detections by confidence score
- Select the highest-scoring detection
- Remove all other detections with IoU above a threshold (e.g., 0.5) with the selected detection
- Repeat until no detections remain
Multi-Scale Detection
YOLOv3 predicts at three different scales by extracting features from different depths of the network. This enables detection of objects at varying sizes, with deeper features detecting larger objects and shallower features detecting smaller ones.