Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Tencent Ncnn YOLACT Example

From Leeroopedia


Knowledge Sources
Domains Vision, Instance_Segmentation
Last Updated 2026-02-09 19:00 GMT

Overview

Concrete tool for real-time instance segmentation using YOLACT with ncnn.

Description

This example implements YOLACT (You Only Look At CoefficienTs) real-time instance segmentation using the ncnn inference framework. The model produces both bounding box detections and per-instance segmentation masks by combining learned mask coefficients with prototype masks. The input image is resized to 550x550 pixels with ImageNet-style normalization (mean subtraction and standard deviation scaling). The network extracts four output blobs: prototype mask maps (138x138x32), bounding box locations (4x19248), mask coefficients (32x19248), and class confidence scores (81x19248 for 80 COCO classes plus background). Instance masks are generated by matrix multiplication of mask coefficients with prototype masks, followed by sigmoid activation and cropping to the bounding box region.

Usage

Use this example when you need per-object pixel-level segmentation masks in addition to bounding boxes. YOLACT provides a good balance between segmentation quality and inference speed, making it suitable for real-time applications on mobile and edge devices.

Code Reference

Source Location

Signature

struct Object
{
    cv::Rect_<float> rect;
    int label;
    float prob;
    std::vector<float> maskdata;
    cv::Mat mask;
};

static int detect_yolact(const cv::Mat& bgr, std::vector<Object>& objects);

static void qsort_descent_inplace(std::vector<Object>& objects);
static void nms_sorted_bboxes(const std::vector<Object>& faceobjects,
                               std::vector<int>& picked, float nms_threshold,
                               bool agnostic = false);

Import

#include "net.h"

I/O Contract

Inputs

Name Type Required Description
image_path const char* Yes Path to input image file

Outputs

Name Type Description
objects std::vector<Object> Detected objects with bounding boxes, class labels, confidence scores, and instance masks

Model Files

File Description
yolact.param YOLACT ncnn model parameter file (ResNet-50 backbone)
yolact.bin YOLACT ncnn model weight file

Usage Examples

Running the Example

./yolact image.jpg

Key Code Pattern

ncnn::Net yolact;
yolact.opt.use_vulkan_compute = true;

yolact.load_param("yolact.param");
yolact.load_model("yolact.bin");

// Resize to 550x550 with ImageNet normalization
ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data,
    ncnn::Mat::PIXEL_BGR2RGB, img_w, img_h, 550, 550);

const float mean_vals[3] = {123.68f, 116.78f, 103.94f};
const float norm_vals[3] = {1.0 / 58.40f, 1.0 / 57.12f, 1.0 / 57.38f};
in.substract_mean_normalize(mean_vals, norm_vals);

ncnn::Extractor ex = yolact.create_extractor();
ex.input("input.1", in);

ncnn::Mat maskmaps;    // 138x138 x 32 prototype masks
ncnn::Mat location;    // 4 x 19248 bounding boxes
ncnn::Mat mask;        // 32 x 19248 mask coefficients
ncnn::Mat confidence;  // 81 x 19248 class scores

ex.extract("619", maskmaps);
ex.extract("816", location);
ex.extract("818", mask);
ex.extract("820", confidence);

Implementation Details

Preprocessing

The input image is resized to a fixed 550x550 resolution (not letterboxed). Pixel values are converted from BGR to RGB, then normalized using ImageNet mean values (123.68, 116.78, 103.94) and standard deviation values (58.40, 57.12, 57.38).

Mask Generation

Instance masks are produced through the coefficient-based approach: each detection has 32 mask coefficients that are combined with 32 prototype masks (138x138) via matrix multiplication. The result is passed through a sigmoid activation and cropped to the detection's bounding box to produce the final binary mask.

Prior Box Generation

YOLACT uses anchor-based detection with prior boxes generated across 5 feature map scales with sizes [69, 35, 18, 9, 5] and 3 aspect ratios [1.0, 0.5, 2.0], producing 19248 total anchors.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment