Implementation:Tencent Ncnn SCRFD CrowdHuman Example
| Knowledge Sources | |
|---|---|
| Domains | Vision, Face_Detection |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Concrete tool for face and person detection inference in crowded scenes using SCRFD trained on the CrowdHuman dataset with ncnn.
Description
This example implements a variant of the SCRFD face detector specifically trained on the CrowdHuman dataset for improved performance in dense crowd scenarios. It uses the same SCRFD architecture as the standard SCRFD example but with a model (scrfd_crowdhuman) trained on CrowdHuman data, which provides better detection in scenes with heavy occlusion and many overlapping faces.
The preprocessing is identical to the standard SCRFD example: the input image is resized so the longer side fits within 640 pixels, converted from BGR to RGB, and padded to a multiple of 32 pixels with zero-padding, using mean=127.5 and scale=1/128 normalization. The key architectural difference is that this model uses five stride levels (8, 16, 32, 64, 128) instead of three, providing a wider range of detection scales from small faces to large persons. For each stride, anchors are generated with ratio=2.0 and a single scale of 3.0, with base sizes matching the stride value (8, 16, 32, 64, 128). The distance-to-bbox decoding, NMS (threshold 0.45), and confidence filtering (threshold 0.3) follow the same approach as the standard SCRFD example.
Usage
Use this example for face detection in crowded scenes where standard face detectors may struggle with occlusion and dense packing of faces. The five-stride architecture provides better coverage across scales.
Code Reference
Source Location
- Repository: Tencent_Ncnn
- File: examples/scrfd_crowdhuman.cpp
- Lines: 1-462
Signature
static ncnn::Mat generate_anchors(int base_size, const ncnn::Mat& ratios, const ncnn::Mat& scales);
static void generate_proposals(const ncnn::Mat& anchors, int feat_stride, const ncnn::Mat& score_blob, const ncnn::Mat& bbox_blob, float prob_threshold, std::vector<FaceObject>& faceobjects);
static int detect_scrfd(const cv::Mat& bgr, std::vector<FaceObject>& faceobjects);
static void draw_faceobjects(const cv::Mat& bgr, const std::vector<FaceObject>& faceobjects);
int main(int argc, char** argv);
Import
#include "net.h"
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| imagepath | const char* (argv[1]) | Yes | Path to input image (typically a crowd scene) |
Outputs
| Name | Type | Description |
|---|---|---|
| faceobjects | std::vector<FaceObject> | Detected faces with bounding box (Rect_<float>) and confidence (float) |
| visualization | cv::Mat | Image with green bounding boxes and confidence labels displayed via cv::imshow |
Usage Examples
Running the Example
./scrfd_crowdhuman crowd_image.jpg
Key Code Pattern
ncnn::Net scrfd;
scrfd.opt.use_vulkan_compute = true;
scrfd.load_param("scrfd_crowdhuman.param");
scrfd.load_model("scrfd_crowdhuman.bin");
ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR2RGB, width, height, w, h);
// Pad to multiple of 32
int wpad = (w + 31) / 32 * 32 - w;
int hpad = (h + 31) / 32 * 32 - h;
ncnn::Mat in_pad;
ncnn::copy_make_border(in, in_pad, hpad / 2, hpad - hpad / 2, wpad / 2, wpad - wpad / 2, ncnn::BORDER_CONSTANT, 0.f);
const float mean_vals[3] = {127.5f, 127.5f, 127.5f};
const float norm_vals[3] = {1 / 128.f, 1 / 128.f, 1 / 128.f};
in_pad.substract_mean_normalize(mean_vals, norm_vals);
ncnn::Extractor ex = scrfd.create_extractor();
ex.input("input.1", in_pad);
// Decode from 5 stride levels (8, 16, 32, 64, 128)
std::vector<FaceObject> faceproposals;
// stride 8: extract("490", score), extract("493", bbox)
// stride 16: extract("510", score), extract("513", bbox)
// stride 32: extract("530", score), extract("533", bbox)
// stride 64: extract("550", score), extract("553", bbox)
// stride 128: extract("570", score), extract("573", bbox)
qsort_descent_inplace(faceproposals);
nms_sorted_bboxes(faceproposals, picked, 0.45f);
// Adjust for padding offset and scale