Implementation:Tencent Ncnn YOLACT Example
| Knowledge Sources | |
|---|---|
| Domains | Vision, Instance_Segmentation |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Concrete tool for real-time instance segmentation using YOLACT with ncnn.
Description
This example implements YOLACT (You Only Look At CoefficienTs) real-time instance segmentation using the ncnn inference framework. The model produces both bounding box detections and per-instance segmentation masks by combining learned mask coefficients with prototype masks. The input image is resized to 550x550 pixels with ImageNet-style normalization (mean subtraction and standard deviation scaling). The network extracts four output blobs: prototype mask maps (138x138x32), bounding box locations (4x19248), mask coefficients (32x19248), and class confidence scores (81x19248 for 80 COCO classes plus background). Instance masks are generated by matrix multiplication of mask coefficients with prototype masks, followed by sigmoid activation and cropping to the bounding box region.
Usage
Use this example when you need per-object pixel-level segmentation masks in addition to bounding boxes. YOLACT provides a good balance between segmentation quality and inference speed, making it suitable for real-time applications on mobile and edge devices.
Code Reference
Source Location
- Repository: Tencent_Ncnn
- File: examples/yolact.cpp
- Lines: 1-538
Signature
struct Object
{
cv::Rect_<float> rect;
int label;
float prob;
std::vector<float> maskdata;
cv::Mat mask;
};
static int detect_yolact(const cv::Mat& bgr, std::vector<Object>& objects);
static void qsort_descent_inplace(std::vector<Object>& objects);
static void nms_sorted_bboxes(const std::vector<Object>& faceobjects,
std::vector<int>& picked, float nms_threshold,
bool agnostic = false);
Import
#include "net.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| image_path | const char* | Yes | Path to input image file |
Outputs
| Name | Type | Description |
|---|---|---|
| objects | std::vector<Object> | Detected objects with bounding boxes, class labels, confidence scores, and instance masks |
Model Files
| File | Description |
|---|---|
| yolact.param | YOLACT ncnn model parameter file (ResNet-50 backbone) |
| yolact.bin | YOLACT ncnn model weight file |
Usage Examples
Running the Example
./yolact image.jpg
Key Code Pattern
ncnn::Net yolact;
yolact.opt.use_vulkan_compute = true;
yolact.load_param("yolact.param");
yolact.load_model("yolact.bin");
// Resize to 550x550 with ImageNet normalization
ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data,
ncnn::Mat::PIXEL_BGR2RGB, img_w, img_h, 550, 550);
const float mean_vals[3] = {123.68f, 116.78f, 103.94f};
const float norm_vals[3] = {1.0 / 58.40f, 1.0 / 57.12f, 1.0 / 57.38f};
in.substract_mean_normalize(mean_vals, norm_vals);
ncnn::Extractor ex = yolact.create_extractor();
ex.input("input.1", in);
ncnn::Mat maskmaps; // 138x138 x 32 prototype masks
ncnn::Mat location; // 4 x 19248 bounding boxes
ncnn::Mat mask; // 32 x 19248 mask coefficients
ncnn::Mat confidence; // 81 x 19248 class scores
ex.extract("619", maskmaps);
ex.extract("816", location);
ex.extract("818", mask);
ex.extract("820", confidence);
Implementation Details
Preprocessing
The input image is resized to a fixed 550x550 resolution (not letterboxed). Pixel values are converted from BGR to RGB, then normalized using ImageNet mean values (123.68, 116.78, 103.94) and standard deviation values (58.40, 57.12, 57.38).
Mask Generation
Instance masks are produced through the coefficient-based approach: each detection has 32 mask coefficients that are combined with 32 prototype masks (138x138) via matrix multiplication. The result is passed through a sigmoid activation and cropped to the detection's bounding box to produce the final binary mask.
Prior Box Generation
YOLACT uses anchor-based detection with prior boxes generated across 5 feature map scales with sizes [69, 35, 18, 9, 5] and 3 aspect ratios [1.0, 0.5, 2.0], producing 19248 total anchors.