Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Tencent Ncnn YOLOX Example

From Leeroopedia


Knowledge Sources
Domains Vision, Object_Detection
Last Updated 2026-02-09 19:00 GMT

Overview

Concrete tool for anchor-free object detection using YOLOX with ncnn.

Description

This example implements YOLOX (from Megvii) anchor-free object detection for 80 COCO classes using the ncnn inference framework. YOLOX bridges the transition from anchor-based (YOLOv5) to anchor-free (YOLOv8+) detection paradigms. The implementation registers a custom YoloV5Focus layer that reuses the same Focus module as YOLOv5 for input downsampling. It generates grid-and-stride proposals across three stride levels (8, 16, 32), decoding center coordinates, dimensions, objectness score, and per-class scores from each grid cell. Input images are preprocessed with letterbox padding to 640x640 resolution (configurable to 416 for smaller models). The detection output combines objectness and class scores as box_prob = box_objectness * box_cls_score.

Usage

Use this example when you want anchor-free object detection with the YOLOX architecture, which provides a good balance of speed and accuracy. YOLOX is particularly useful as a reference for understanding the transition from anchor-based to anchor-free detection in the YOLO family.

Code Reference

Source Location

Signature

class YoloV5Focus : public ncnn::Layer
{
public:
    YoloV5Focus();
    virtual int forward(const ncnn::Mat& bottom_blob, ncnn::Mat& top_blob,
                        const ncnn::Option& opt) const;
};

struct Object
{
    cv::Rect_<float> rect;
    int label;
    float prob;
};

struct GridAndStride
{
    int grid0;
    int grid1;
    int stride;
};

static int detect_yolox(const cv::Mat& bgr, std::vector<Object>& objects);

static void generate_grids_and_stride(const int target_w, const int target_h,
                                       std::vector<int>& strides,
                                       std::vector<GridAndStride>& grid_strides);
static void generate_yolox_proposals(std::vector<GridAndStride> grid_strides,
                                      const ncnn::Mat& feat_blob,
                                      float prob_threshold,
                                      std::vector<Object>& objects);
static void qsort_descent_inplace(std::vector<Object>& objects);
static void nms_sorted_bboxes(const std::vector<Object>& faceobjects,
                               std::vector<int>& picked, float nms_threshold,
                               bool agnostic = false);

Import

#include "layer.h"
#include "net.h"

I/O Contract

Inputs

Name Type Required Description
image_path const char* Yes Path to input image file

Outputs

Name Type Description
objects std::vector<Object> Detected objects with bounding boxes, class labels, and confidence scores for 80 COCO classes

Model Files

File Description
yolox.param YOLOX ncnn model parameter file
yolox.bin YOLOX ncnn model weight file

Usage Examples

Running the Example

./yolox image.jpg

Key Code Pattern

ncnn::Net yolox;
yolox.opt.use_vulkan_compute = true;

// Register custom Focus layer
yolox.register_custom_layer("YoloV5Focus", YoloV5Focus_layer_creator);

yolox.load_param("yolox.param");
yolox.load_model("yolox.bin");

// Letterbox pad to 640x640 (bottom-right padding only)
ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data,
    ncnn::Mat::PIXEL_BGR, img_w, img_h, w, h);
ncnn::copy_make_border(in, in_pad, 0, hpad, 0, wpad,
    ncnn::BORDER_CONSTANT, 114.f);

ncnn::Extractor ex = yolox.create_extractor();
ex.input("images", in_pad);

ncnn::Mat out;
ex.extract("output", out);

// Generate grid-stride anchors
std::vector<int> strides = {8, 16, 32};
std::vector<GridAndStride> grid_strides;
generate_grids_and_stride(in_pad.w, in_pad.h, strides, grid_strides);

// Decode proposals: prob = objectness * class_score
generate_yolox_proposals(grid_strides, out, YOLOX_CONF_THRESH, proposals);

Implementation Details

Custom YoloV5Focus Layer

YOLOX reuses the Focus module from YOLOv5, which performs space-to-depth transformation. The custom layer rearranges input pixels by taking every other pixel in both spatial dimensions, expanding channels by 4x while halving spatial dimensions. This is registered via yolox.register_custom_layer("YoloV5Focus", YoloV5Focus_layer_creator).

Preprocessing

Input images are resized while preserving aspect ratio and padded to a multiple of 32. Unlike YOLOv5 which pads symmetrically, YOLOX only pads on the bottom and right side, which means users do not need extra padding info to decode box coordinates. Pixel format is BGR (no RGB conversion). No mean subtraction or normalization is applied by default (newer YOLOX versions removed normalization from the model).

Anchor-Free Decoding

Each grid cell at stride s produces a prediction decoded as:

  • x_center = (output_x + grid_x) * stride
  • y_center = (output_y + grid_y) * stride
  • w = exp(output_w) * stride
  • h = exp(output_h) * stride
  • box_prob = objectness * class_score

Default Thresholds

Parameter Value
YOLOX_NMS_THRESH 0.45
YOLOX_CONF_THRESH 0.25
YOLOX_TARGET_SIZE 640

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment