Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Tencent Ncnn YOLO11 Seg Example

From Leeroopedia


Knowledge Sources
Domains Vision, Instance_Segmentation
Last Updated 2026-02-09 19:00 GMT

Overview

Concrete tool for instance segmentation using YOLO11 with ncnn.

Description

This example implements YOLO11 instance segmentation using the ncnn inference framework, detecting objects with both bounding boxes and per-instance pixel masks. The model produces three output blobs: a detection blob (w=176, h=8400) containing DFL bbox regression (16x4=64 values) and per-class scores (80 COCO classes), a mask coefficient blob (w=32, h=8400) with 32 mask coefficients per detection, and prototype masks (32x160x160). Instance masks are generated by matrix multiplication of mask coefficients with prototype masks, followed by sigmoid activation and cropping to the bounding box region. Input images are preprocessed with letterbox padding to 640x640 resolution.

Usage

Use this example when you need pixel-level object segmentation in addition to bounding box detection. YOLO11-Seg provides fast instance segmentation suitable for applications like autonomous driving scene understanding, robotic manipulation, or image editing on edge devices.

Code Reference

Source Location

Signature

struct Object
{
    cv::Rect_<float> rect;
    int label;
    float prob;
    int gindex;
    cv::Mat mask;
};

static int detect_yolo11_seg(const cv::Mat& bgr, std::vector<Object>& objects);

static void generate_proposals(int stride, const ncnn::Mat& pred,
                               const ncnn::Mat& pred_mask,
                               float prob_threshold, std::vector<Object>& objects);
static void qsort_descent_inplace(std::vector<Object>& objects);
static void nms_sorted_bboxes(const std::vector<Object>& objects,
                               std::vector<int>& picked, float nms_threshold,
                               bool agnostic = false);

Import

#include "layer.h"
#include "net.h"

I/O Contract

Inputs

Name Type Required Description
image_path const char* Yes Path to input image file

Outputs

Name Type Description
objects std::vector<Object> Detected objects with bounding boxes, class labels, confidence scores, and per-instance binary masks (cv::Mat)

Model Files

File Description
yolo11n_seg.ncnn.param YOLO11-Seg nano model parameter file
yolo11n_seg.ncnn.bin YOLO11-Seg nano model weight file

Usage Examples

Running the Example

./yolo11_seg image.jpg

Key Code Pattern

ncnn::Net yolo11;
yolo11.opt.use_vulkan_compute = true;

yolo11.load_param("yolo11n_seg.ncnn.param");
yolo11.load_model("yolo11n_seg.ncnn.bin");

const int target_size = 640;
const float prob_threshold = 0.25f;
const float nms_threshold = 0.45f;
const float mask_threshold = 0.5f;

// Letterbox pad to 640x640
ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data,
    ncnn::Mat::PIXEL_BGR2RGB, img_w, img_h, w, h);

const float norm_vals[3] = {1 / 255.f, 1 / 255.f, 1 / 255.f};
in_pad.substract_mean_normalize(0, norm_vals);

ncnn::Extractor ex = yolo11.create_extractor();
ex.input("in0", in_pad);

ncnn::Mat out0;       // bbox + class scores (w=176, h=8400)
ncnn::Mat out1;       // mask coefficients (w=32, h=8400)
ncnn::Mat out2;       // prototype masks (32x160x160)
ex.extract("out0", out0);
ex.extract("out1", out1);
ex.extract("out2", out2);

Implementation Details

Preprocessing

Input images are resized while preserving aspect ratio and letterbox padded to 640x640 (a multiple of max_stride=32). Pixel values are converted from BGR to RGB and normalized by dividing by 255. The padding fill value is 114.

Output Tensor Layout

The model produces three output tensors:

  • out0 (w=176, h=8400): DFL bbox regression (64 values) + 80 class scores + 32 mask coefficients per candidate
  • out1 (w=32, h=8400): 32 mask coefficients per candidate box
  • out2 (32x160x160): 32 prototype mask channels at 1/4 input resolution

Mask Generation Pipeline

  1. Generate detection proposals from out0 using DFL bbox decoding
  2. Apply NMS to filter overlapping detections
  3. For each surviving detection, compute mask = sigmoid(coefficients * prototype_masks)
  4. Crop mask to the detection's bounding box region
  5. Threshold at 0.5 to produce binary mask
  6. Overlay colored masks on the output image

Model Conversion

Models are converted from Ultralytics format using PNNX with modifications for dynamic shape inference, including reshaping output concatenation and area attention layers for variable input sizes.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment