Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Tencent Ncnn SCRFD CrowdHuman Example

From Leeroopedia
Revision as of 16:49, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Tencent_Ncnn_SCRFD_CrowdHuman_Example.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Vision, Face_Detection
Last Updated 2026-02-09 19:00 GMT

Overview

Concrete tool for face and person detection inference in crowded scenes using SCRFD trained on the CrowdHuman dataset with ncnn.

Description

This example implements a variant of the SCRFD face detector specifically trained on the CrowdHuman dataset for improved performance in dense crowd scenarios. It uses the same SCRFD architecture as the standard SCRFD example but with a model (scrfd_crowdhuman) trained on CrowdHuman data, which provides better detection in scenes with heavy occlusion and many overlapping faces.

The preprocessing is identical to the standard SCRFD example: the input image is resized so the longer side fits within 640 pixels, converted from BGR to RGB, and padded to a multiple of 32 pixels with zero-padding, using mean=127.5 and scale=1/128 normalization. The key architectural difference is that this model uses five stride levels (8, 16, 32, 64, 128) instead of three, providing a wider range of detection scales from small faces to large persons. For each stride, anchors are generated with ratio=2.0 and a single scale of 3.0, with base sizes matching the stride value (8, 16, 32, 64, 128). The distance-to-bbox decoding, NMS (threshold 0.45), and confidence filtering (threshold 0.3) follow the same approach as the standard SCRFD example.

Usage

Use this example for face detection in crowded scenes where standard face detectors may struggle with occlusion and dense packing of faces. The five-stride architecture provides better coverage across scales.

Code Reference

Source Location

Signature

static ncnn::Mat generate_anchors(int base_size, const ncnn::Mat& ratios, const ncnn::Mat& scales);
static void generate_proposals(const ncnn::Mat& anchors, int feat_stride, const ncnn::Mat& score_blob, const ncnn::Mat& bbox_blob, float prob_threshold, std::vector<FaceObject>& faceobjects);
static int detect_scrfd(const cv::Mat& bgr, std::vector<FaceObject>& faceobjects);
static void draw_faceobjects(const cv::Mat& bgr, const std::vector<FaceObject>& faceobjects);
int main(int argc, char** argv);

Import

#include "net.h"
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>

I/O Contract

Inputs

Name Type Required Description
imagepath const char* (argv[1]) Yes Path to input image (typically a crowd scene)

Outputs

Name Type Description
faceobjects std::vector<FaceObject> Detected faces with bounding box (Rect_<float>) and confidence (float)
visualization cv::Mat Image with green bounding boxes and confidence labels displayed via cv::imshow

Usage Examples

Running the Example

./scrfd_crowdhuman crowd_image.jpg

Key Code Pattern

ncnn::Net scrfd;
scrfd.opt.use_vulkan_compute = true;
scrfd.load_param("scrfd_crowdhuman.param");
scrfd.load_model("scrfd_crowdhuman.bin");

ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR2RGB, width, height, w, h);

// Pad to multiple of 32
int wpad = (w + 31) / 32 * 32 - w;
int hpad = (h + 31) / 32 * 32 - h;
ncnn::Mat in_pad;
ncnn::copy_make_border(in, in_pad, hpad / 2, hpad - hpad / 2, wpad / 2, wpad - wpad / 2, ncnn::BORDER_CONSTANT, 0.f);

const float mean_vals[3] = {127.5f, 127.5f, 127.5f};
const float norm_vals[3] = {1 / 128.f, 1 / 128.f, 1 / 128.f};
in_pad.substract_mean_normalize(mean_vals, norm_vals);

ncnn::Extractor ex = scrfd.create_extractor();
ex.input("input.1", in_pad);

// Decode from 5 stride levels (8, 16, 32, 64, 128)
std::vector<FaceObject> faceproposals;
// stride 8:   extract("490", score), extract("493", bbox)
// stride 16:  extract("510", score), extract("513", bbox)
// stride 32:  extract("530", score), extract("533", bbox)
// stride 64:  extract("550", score), extract("553", bbox)
// stride 128: extract("570", score), extract("573", bbox)

qsort_descent_inplace(faceproposals);
nms_sorted_bboxes(faceproposals, picked, 0.45f);
// Adjust for padding offset and scale

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment