Implementation:Tencent Ncnn YOLOX Example
| Knowledge Sources | |
|---|---|
| Domains | Vision, Object_Detection |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Concrete tool for anchor-free object detection using YOLOX with ncnn.
Description
This example implements YOLOX (from Megvii) anchor-free object detection for 80 COCO classes using the ncnn inference framework. YOLOX bridges the transition from anchor-based (YOLOv5) to anchor-free (YOLOv8+) detection paradigms. The implementation registers a custom YoloV5Focus layer that reuses the same Focus module as YOLOv5 for input downsampling. It generates grid-and-stride proposals across three stride levels (8, 16, 32), decoding center coordinates, dimensions, objectness score, and per-class scores from each grid cell. Input images are preprocessed with letterbox padding to 640x640 resolution (configurable to 416 for smaller models). The detection output combines objectness and class scores as box_prob = box_objectness * box_cls_score.
Usage
Use this example when you want anchor-free object detection with the YOLOX architecture, which provides a good balance of speed and accuracy. YOLOX is particularly useful as a reference for understanding the transition from anchor-based to anchor-free detection in the YOLO family.
Code Reference
Source Location
- Repository: Tencent_Ncnn
- File: examples/yolox.cpp
- Lines: 1-413
Signature
class YoloV5Focus : public ncnn::Layer
{
public:
YoloV5Focus();
virtual int forward(const ncnn::Mat& bottom_blob, ncnn::Mat& top_blob,
const ncnn::Option& opt) const;
};
struct Object
{
cv::Rect_<float> rect;
int label;
float prob;
};
struct GridAndStride
{
int grid0;
int grid1;
int stride;
};
static int detect_yolox(const cv::Mat& bgr, std::vector<Object>& objects);
static void generate_grids_and_stride(const int target_w, const int target_h,
std::vector<int>& strides,
std::vector<GridAndStride>& grid_strides);
static void generate_yolox_proposals(std::vector<GridAndStride> grid_strides,
const ncnn::Mat& feat_blob,
float prob_threshold,
std::vector<Object>& objects);
static void qsort_descent_inplace(std::vector<Object>& objects);
static void nms_sorted_bboxes(const std::vector<Object>& faceobjects,
std::vector<int>& picked, float nms_threshold,
bool agnostic = false);
Import
#include "layer.h"
#include "net.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| image_path | const char* | Yes | Path to input image file |
Outputs
| Name | Type | Description |
|---|---|---|
| objects | std::vector<Object> | Detected objects with bounding boxes, class labels, and confidence scores for 80 COCO classes |
Model Files
| File | Description |
|---|---|
| yolox.param | YOLOX ncnn model parameter file |
| yolox.bin | YOLOX ncnn model weight file |
Usage Examples
Running the Example
./yolox image.jpg
Key Code Pattern
ncnn::Net yolox;
yolox.opt.use_vulkan_compute = true;
// Register custom Focus layer
yolox.register_custom_layer("YoloV5Focus", YoloV5Focus_layer_creator);
yolox.load_param("yolox.param");
yolox.load_model("yolox.bin");
// Letterbox pad to 640x640 (bottom-right padding only)
ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data,
ncnn::Mat::PIXEL_BGR, img_w, img_h, w, h);
ncnn::copy_make_border(in, in_pad, 0, hpad, 0, wpad,
ncnn::BORDER_CONSTANT, 114.f);
ncnn::Extractor ex = yolox.create_extractor();
ex.input("images", in_pad);
ncnn::Mat out;
ex.extract("output", out);
// Generate grid-stride anchors
std::vector<int> strides = {8, 16, 32};
std::vector<GridAndStride> grid_strides;
generate_grids_and_stride(in_pad.w, in_pad.h, strides, grid_strides);
// Decode proposals: prob = objectness * class_score
generate_yolox_proposals(grid_strides, out, YOLOX_CONF_THRESH, proposals);
Implementation Details
Custom YoloV5Focus Layer
YOLOX reuses the Focus module from YOLOv5, which performs space-to-depth transformation. The custom layer rearranges input pixels by taking every other pixel in both spatial dimensions, expanding channels by 4x while halving spatial dimensions. This is registered via yolox.register_custom_layer("YoloV5Focus", YoloV5Focus_layer_creator).
Preprocessing
Input images are resized while preserving aspect ratio and padded to a multiple of 32. Unlike YOLOv5 which pads symmetrically, YOLOX only pads on the bottom and right side, which means users do not need extra padding info to decode box coordinates. Pixel format is BGR (no RGB conversion). No mean subtraction or normalization is applied by default (newer YOLOX versions removed normalization from the model).
Anchor-Free Decoding
Each grid cell at stride s produces a prediction decoded as:
x_center = (output_x + grid_x) * stridey_center = (output_y + grid_y) * stridew = exp(output_w) * strideh = exp(output_h) * stridebox_prob = objectness * class_score
Default Thresholds
| Parameter | Value |
|---|---|
| YOLOX_NMS_THRESH | 0.45 |
| YOLOX_CONF_THRESH | 0.25 |
| YOLOX_TARGET_SIZE | 640 |