Implementation:Tencent Ncnn YOLOv8 Seg Example

Knowledge Sources	Tencent_Ncnn
Domains	Vision, Instance_Segmentation
Last Updated	2026-02-09 19:00 GMT

Overview

Concrete tool for instance segmentation using YOLOv8 with ncnn.

Description

This example implements YOLOv8 instance segmentation using the ncnn inference framework, detecting objects with both bounding boxes and per-instance pixel masks for 80 COCO classes. The model produces three output blobs: a detection blob (w=176, h=8400) containing DFL bbox regression (16x4=64 values) and per-class scores (80 classes), a mask coefficient blob (w=32, h=8400) with 32 mask coefficients per detection, and prototype masks (32x160x160) at one-quarter input resolution. Instance masks are generated by matrix multiplication of mask coefficients with prototype masks, followed by sigmoid activation and thresholding at 0.5 to produce binary masks. Input images are preprocessed with letterbox padding to 640x640 resolution.

Usage

Use this example when you need pixel-level object segmentation using the YOLOv8 architecture. It is suitable for applications requiring both object detection and precise shape delineation on mobile and edge devices. This is the YOLOv8 predecessor to the YOLO11 segmentation variant.

Code Reference

Source Location

Repository: Tencent_Ncnn
File: examples/yolov8_seg.cpp
Lines: 1-613

Signature

struct Object
{
    cv::Rect_<float> rect;
    int label;
    float prob;
    int gindex;
    cv::Mat mask;
};

static int detect_yolov8_seg(const cv::Mat& bgr, std::vector<Object>& objects);

static void generate_proposals(int stride, const ncnn::Mat& pred,
                               const ncnn::Mat& pred_mask,
                               float prob_threshold, std::vector<Object>& objects);
static void qsort_descent_inplace(std::vector<Object>& objects);
static void nms_sorted_bboxes(const std::vector<Object>& objects,
                               std::vector<int>& picked, float nms_threshold,
                               bool agnostic = false);

Import

#include "layer.h"
#include "net.h"

I/O Contract

Inputs

Name	Type	Required	Description
image_path	const char*	Yes	Path to input image file

Outputs

Name	Type	Description
objects	std::vector<Object>	Detected objects with bounding boxes, class labels, confidence scores, and per-instance binary masks (cv::Mat)

Model Files

File	Description
yolov8n_seg.ncnn.param	YOLOv8-Seg nano model parameter file
yolov8n_seg.ncnn.bin	YOLOv8-Seg nano model weight file

Usage Examples

Running the Example

./yolov8_seg image.jpg

Key Code Pattern

ncnn::Net yolov8;
yolov8.opt.use_vulkan_compute = true;

yolov8.load_param("yolov8n_seg.ncnn.param");
yolov8.load_model("yolov8n_seg.ncnn.bin");

const int target_size = 640;
const float prob_threshold = 0.25f;
const float nms_threshold = 0.45f;
const float mask_threshold = 0.5f;

// Letterbox pad to 640x640
ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data,
    ncnn::Mat::PIXEL_BGR2RGB, img_w, img_h, w, h);

const float norm_vals[3] = {1 / 255.f, 1 / 255.f, 1 / 255.f};
in_pad.substract_mean_normalize(0, norm_vals);

ncnn::Extractor ex = yolov8.create_extractor();
ex.input("in0", in_pad);

ncnn::Mat out0;       // bbox + class scores (w=176, h=8400)
ncnn::Mat out1;       // mask coefficients (w=32, h=8400)
ncnn::Mat out2;       // prototype masks (32x160x160)
ex.extract("out0", out0);
ex.extract("out1", out1);
ex.extract("out2", out2);

Implementation Details

Preprocessing

Input images are resized while preserving aspect ratio and letterbox padded to 640x640 (a multiple of max_stride=32). Pixel values are converted from BGR to RGB and normalized by dividing by 255. The padding fill value is 114.

Output Tensor Layout

The model produces three output tensors:

out0 (w=176, h=8400): DFL bbox regression (64 values) + 80 class scores + 32 mask coefficients per candidate
out1 (w=32, h=8400): 32 mask coefficients per candidate box
out2 (32x160x160): 32 prototype mask channels at 1/4 input resolution

Mask Generation Pipeline

Generate detection proposals from out0 using DFL bbox decoding
Apply NMS to filter overlapping detections
For each surviving detection, compute mask = sigmoid(coefficients * prototype_masks)
Crop mask to the detection's bounding box region
Threshold at 0.5 to produce binary mask
Overlay colored masks on the output image

Model Conversion

Models are converted from Ultralytics format using PNNX. The conversion involves modifying reshape and concatenation operations for dynamic shape support, then re-exporting with dual input shapes (640x640 and 320x320).

Related Pages

Environment:Tencent_Ncnn_Build_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment