Implementation:Tencent Ncnn YOLOv8 Seg Example
| Knowledge Sources | |
|---|---|
| Domains | Vision, Instance_Segmentation |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Concrete tool for instance segmentation using YOLOv8 with ncnn.
Description
This example implements YOLOv8 instance segmentation using the ncnn inference framework, detecting objects with both bounding boxes and per-instance pixel masks for 80 COCO classes. The model produces three output blobs: a detection blob (w=176, h=8400) containing DFL bbox regression (16x4=64 values) and per-class scores (80 classes), a mask coefficient blob (w=32, h=8400) with 32 mask coefficients per detection, and prototype masks (32x160x160) at one-quarter input resolution. Instance masks are generated by matrix multiplication of mask coefficients with prototype masks, followed by sigmoid activation and thresholding at 0.5 to produce binary masks. Input images are preprocessed with letterbox padding to 640x640 resolution.
Usage
Use this example when you need pixel-level object segmentation using the YOLOv8 architecture. It is suitable for applications requiring both object detection and precise shape delineation on mobile and edge devices. This is the YOLOv8 predecessor to the YOLO11 segmentation variant.
Code Reference
Source Location
- Repository: Tencent_Ncnn
- File: examples/yolov8_seg.cpp
- Lines: 1-613
Signature
struct Object
{
cv::Rect_<float> rect;
int label;
float prob;
int gindex;
cv::Mat mask;
};
static int detect_yolov8_seg(const cv::Mat& bgr, std::vector<Object>& objects);
static void generate_proposals(int stride, const ncnn::Mat& pred,
const ncnn::Mat& pred_mask,
float prob_threshold, std::vector<Object>& objects);
static void qsort_descent_inplace(std::vector<Object>& objects);
static void nms_sorted_bboxes(const std::vector<Object>& objects,
std::vector<int>& picked, float nms_threshold,
bool agnostic = false);
Import
#include "layer.h"
#include "net.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| image_path | const char* | Yes | Path to input image file |
Outputs
| Name | Type | Description |
|---|---|---|
| objects | std::vector<Object> | Detected objects with bounding boxes, class labels, confidence scores, and per-instance binary masks (cv::Mat) |
Model Files
| File | Description |
|---|---|
| yolov8n_seg.ncnn.param | YOLOv8-Seg nano model parameter file |
| yolov8n_seg.ncnn.bin | YOLOv8-Seg nano model weight file |
Usage Examples
Running the Example
./yolov8_seg image.jpg
Key Code Pattern
ncnn::Net yolov8;
yolov8.opt.use_vulkan_compute = true;
yolov8.load_param("yolov8n_seg.ncnn.param");
yolov8.load_model("yolov8n_seg.ncnn.bin");
const int target_size = 640;
const float prob_threshold = 0.25f;
const float nms_threshold = 0.45f;
const float mask_threshold = 0.5f;
// Letterbox pad to 640x640
ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data,
ncnn::Mat::PIXEL_BGR2RGB, img_w, img_h, w, h);
const float norm_vals[3] = {1 / 255.f, 1 / 255.f, 1 / 255.f};
in_pad.substract_mean_normalize(0, norm_vals);
ncnn::Extractor ex = yolov8.create_extractor();
ex.input("in0", in_pad);
ncnn::Mat out0; // bbox + class scores (w=176, h=8400)
ncnn::Mat out1; // mask coefficients (w=32, h=8400)
ncnn::Mat out2; // prototype masks (32x160x160)
ex.extract("out0", out0);
ex.extract("out1", out1);
ex.extract("out2", out2);
Implementation Details
Preprocessing
Input images are resized while preserving aspect ratio and letterbox padded to 640x640 (a multiple of max_stride=32). Pixel values are converted from BGR to RGB and normalized by dividing by 255. The padding fill value is 114.
Output Tensor Layout
The model produces three output tensors:
- out0 (w=176, h=8400): DFL bbox regression (64 values) + 80 class scores + 32 mask coefficients per candidate
- out1 (w=32, h=8400): 32 mask coefficients per candidate box
- out2 (32x160x160): 32 prototype mask channels at 1/4 input resolution
Mask Generation Pipeline
- Generate detection proposals from out0 using DFL bbox decoding
- Apply NMS to filter overlapping detections
- For each surviving detection, compute mask = sigmoid(coefficients * prototype_masks)
- Crop mask to the detection's bounding box region
- Threshold at 0.5 to produce binary mask
- Overlay colored masks on the output image
Model Conversion
Models are converted from Ultralytics format using PNNX. The conversion involves modifying reshape and concatenation operations for dynamic shape support, then re-exporting with dual input shapes (640x640 and 320x320).