Implementation:Tencent Ncnn YOLO11 Seg Example

Knowledge Sources	Tencent_Ncnn
Domains	Vision, Instance_Segmentation
Last Updated	2026-02-09 19:00 GMT

Overview

Concrete tool for instance segmentation using YOLO11 with ncnn.

Description

This example implements YOLO11 instance segmentation using the ncnn inference framework, detecting objects with both bounding boxes and per-instance pixel masks. The model produces three output blobs: a detection blob (w=176, h=8400) containing DFL bbox regression (16x4=64 values) and per-class scores (80 COCO classes), a mask coefficient blob (w=32, h=8400) with 32 mask coefficients per detection, and prototype masks (32x160x160). Instance masks are generated by matrix multiplication of mask coefficients with prototype masks, followed by sigmoid activation and cropping to the bounding box region. Input images are preprocessed with letterbox padding to 640x640 resolution.

Usage

Use this example when you need pixel-level object segmentation in addition to bounding box detection. YOLO11-Seg provides fast instance segmentation suitable for applications like autonomous driving scene understanding, robotic manipulation, or image editing on edge devices.

Code Reference

Source Location

Repository: Tencent_Ncnn
File: examples/yolo11_seg.cpp
Lines: 1-644

Signature

struct Object
{
    cv::Rect_<float> rect;
    int label;
    float prob;
    int gindex;
    cv::Mat mask;
};

static int detect_yolo11_seg(const cv::Mat& bgr, std::vector<Object>& objects);

static void generate_proposals(int stride, const ncnn::Mat& pred,
                               const ncnn::Mat& pred_mask,
                               float prob_threshold, std::vector<Object>& objects);
static void qsort_descent_inplace(std::vector<Object>& objects);
static void nms_sorted_bboxes(const std::vector<Object>& objects,
                               std::vector<int>& picked, float nms_threshold,
                               bool agnostic = false);

Import

#include "layer.h"
#include "net.h"

I/O Contract

Inputs

Name	Type	Required	Description
image_path	const char*	Yes	Path to input image file

Outputs

Name	Type	Description
objects	std::vector<Object>	Detected objects with bounding boxes, class labels, confidence scores, and per-instance binary masks (cv::Mat)

Model Files

File	Description
yolo11n_seg.ncnn.param	YOLO11-Seg nano model parameter file
yolo11n_seg.ncnn.bin	YOLO11-Seg nano model weight file

Usage Examples

Running the Example

./yolo11_seg image.jpg

Key Code Pattern

ncnn::Net yolo11;
yolo11.opt.use_vulkan_compute = true;

yolo11.load_param("yolo11n_seg.ncnn.param");
yolo11.load_model("yolo11n_seg.ncnn.bin");

const int target_size = 640;
const float prob_threshold = 0.25f;
const float nms_threshold = 0.45f;
const float mask_threshold = 0.5f;

// Letterbox pad to 640x640
ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data,
    ncnn::Mat::PIXEL_BGR2RGB, img_w, img_h, w, h);

const float norm_vals[3] = {1 / 255.f, 1 / 255.f, 1 / 255.f};
in_pad.substract_mean_normalize(0, norm_vals);

ncnn::Extractor ex = yolo11.create_extractor();
ex.input("in0", in_pad);

ncnn::Mat out0;       // bbox + class scores (w=176, h=8400)
ncnn::Mat out1;       // mask coefficients (w=32, h=8400)
ncnn::Mat out2;       // prototype masks (32x160x160)
ex.extract("out0", out0);
ex.extract("out1", out1);
ex.extract("out2", out2);

Implementation Details

Preprocessing

Input images are resized while preserving aspect ratio and letterbox padded to 640x640 (a multiple of max_stride=32). Pixel values are converted from BGR to RGB and normalized by dividing by 255. The padding fill value is 114.

Output Tensor Layout

The model produces three output tensors:

out0 (w=176, h=8400): DFL bbox regression (64 values) + 80 class scores + 32 mask coefficients per candidate
out1 (w=32, h=8400): 32 mask coefficients per candidate box
out2 (32x160x160): 32 prototype mask channels at 1/4 input resolution

Mask Generation Pipeline

Generate detection proposals from out0 using DFL bbox decoding
Apply NMS to filter overlapping detections
For each surviving detection, compute mask = sigmoid(coefficients * prototype_masks)
Crop mask to the detection's bounding box region
Threshold at 0.5 to produce binary mask
Overlay colored masks on the output image

Model Conversion

Models are converted from Ultralytics format using PNNX with modifications for dynamic shape inference, including reshaping output concatenation and area attention layers for variable input sizes.

Related Pages

Environment:Tencent_Ncnn_Build_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment