Implementation:Tencent Ncnn YOLOX Example

Knowledge Sources	Tencent_Ncnn
Domains	Vision, Object_Detection
Last Updated	2026-02-09 19:00 GMT

Overview

Concrete tool for anchor-free object detection using YOLOX with ncnn.

Description

This example implements YOLOX (from Megvii) anchor-free object detection for 80 COCO classes using the ncnn inference framework. YOLOX bridges the transition from anchor-based (YOLOv5) to anchor-free (YOLOv8+) detection paradigms. The implementation registers a custom YoloV5Focus layer that reuses the same Focus module as YOLOv5 for input downsampling. It generates grid-and-stride proposals across three stride levels (8, 16, 32), decoding center coordinates, dimensions, objectness score, and per-class scores from each grid cell. Input images are preprocessed with letterbox padding to 640x640 resolution (configurable to 416 for smaller models). The detection output combines objectness and class scores as box_prob = box_objectness * box_cls_score.

Usage

Use this example when you want anchor-free object detection with the YOLOX architecture, which provides a good balance of speed and accuracy. YOLOX is particularly useful as a reference for understanding the transition from anchor-based to anchor-free detection in the YOLO family.

Code Reference

Source Location

Repository: Tencent_Ncnn
File: examples/yolox.cpp
Lines: 1-413

Signature

class YoloV5Focus : public ncnn::Layer
{
public:
    YoloV5Focus();
    virtual int forward(const ncnn::Mat& bottom_blob, ncnn::Mat& top_blob,
                        const ncnn::Option& opt) const;
};

struct Object
{
    cv::Rect_<float> rect;
    int label;
    float prob;
};

struct GridAndStride
{
    int grid0;
    int grid1;
    int stride;
};

static int detect_yolox(const cv::Mat& bgr, std::vector<Object>& objects);

static void generate_grids_and_stride(const int target_w, const int target_h,
                                       std::vector<int>& strides,
                                       std::vector<GridAndStride>& grid_strides);
static void generate_yolox_proposals(std::vector<GridAndStride> grid_strides,
                                      const ncnn::Mat& feat_blob,
                                      float prob_threshold,
                                      std::vector<Object>& objects);
static void qsort_descent_inplace(std::vector<Object>& objects);
static void nms_sorted_bboxes(const std::vector<Object>& faceobjects,
                               std::vector<int>& picked, float nms_threshold,
                               bool agnostic = false);

Import

#include "layer.h"
#include "net.h"

I/O Contract

Inputs

Name	Type	Required	Description
image_path	const char*	Yes	Path to input image file

Outputs

Name	Type	Description
objects	std::vector<Object>	Detected objects with bounding boxes, class labels, and confidence scores for 80 COCO classes

Model Files

File	Description
yolox.param	YOLOX ncnn model parameter file
yolox.bin	YOLOX ncnn model weight file

Usage Examples

Running the Example

./yolox image.jpg

Key Code Pattern

ncnn::Net yolox;
yolox.opt.use_vulkan_compute = true;

// Register custom Focus layer
yolox.register_custom_layer("YoloV5Focus", YoloV5Focus_layer_creator);

yolox.load_param("yolox.param");
yolox.load_model("yolox.bin");

// Letterbox pad to 640x640 (bottom-right padding only)
ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data,
    ncnn::Mat::PIXEL_BGR, img_w, img_h, w, h);
ncnn::copy_make_border(in, in_pad, 0, hpad, 0, wpad,
    ncnn::BORDER_CONSTANT, 114.f);

ncnn::Extractor ex = yolox.create_extractor();
ex.input("images", in_pad);

ncnn::Mat out;
ex.extract("output", out);

// Generate grid-stride anchors
std::vector<int> strides = {8, 16, 32};
std::vector<GridAndStride> grid_strides;
generate_grids_and_stride(in_pad.w, in_pad.h, strides, grid_strides);

// Decode proposals: prob = objectness * class_score
generate_yolox_proposals(grid_strides, out, YOLOX_CONF_THRESH, proposals);

Implementation Details

Custom YoloV5Focus Layer

YOLOX reuses the Focus module from YOLOv5, which performs space-to-depth transformation. The custom layer rearranges input pixels by taking every other pixel in both spatial dimensions, expanding channels by 4x while halving spatial dimensions. This is registered via yolox.register_custom_layer("YoloV5Focus", YoloV5Focus_layer_creator).

Preprocessing

Input images are resized while preserving aspect ratio and padded to a multiple of 32. Unlike YOLOv5 which pads symmetrically, YOLOX only pads on the bottom and right side, which means users do not need extra padding info to decode box coordinates. Pixel format is BGR (no RGB conversion). No mean subtraction or normalization is applied by default (newer YOLOX versions removed normalization from the model).

Anchor-Free Decoding

Each grid cell at stride s produces a prediction decoded as:

x_center = (output_x + grid_x) * stride
y_center = (output_y + grid_y) * stride
w = exp(output_w) * stride
h = exp(output_h) * stride
box_prob = objectness * class_score

Default Thresholds

Parameter	Value
YOLOX_NMS_THRESH	0.45
YOLOX_CONF_THRESH	0.25
YOLOX_TARGET_SIZE	640

Related Pages

Environment:Tencent_Ncnn_Build_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment