Implementation:Tencent Ncnn YOLOv8 Pose Example

Knowledge Sources	Tencent_Ncnn
Domains	Vision, Pose_Estimation
Last Updated	2026-02-09 19:00 GMT

Overview

Concrete tool for human pose estimation with 17 COCO keypoints using YOLOv8 with ncnn.

Description

This example implements YOLOv8 pose estimation using the ncnn inference framework, combining person detection and keypoint localization in a single forward pass. The model produces two output blobs: a detection blob (w=65, h=8400) containing DFL bbox regression (16x4=64 values) and a person confidence score, and a keypoint blob (w=51, h=8400) containing 17 COCO body keypoints with 3 values each (x coordinate, y coordinate, visibility confidence). Input images are preprocessed with letterbox padding to 640x640 resolution. The output visualization draws skeleton connections between anatomically linked keypoints on detected persons.

Usage

Use this example when you need to detect human poses using the YOLOv8 architecture for applications such as fitness tracking, gesture recognition, or action recognition on mobile and edge devices. This is the YOLOv8 predecessor to the YOLO11 pose estimation variant.

Code Reference

Source Location

Repository: Tencent_Ncnn
File: examples/yolov8_pose.cpp
Lines: 1-550

Signature

struct KeyPoint
{
    cv::Point2f p;
    float prob;
};

struct Object
{
    cv::Rect_<float> rect;
    int label;
    float prob;
    std::vector<KeyPoint> keypoints;
};

static int detect_yolov8_pose(const cv::Mat& bgr, std::vector<Object>& objects);

static void generate_proposals(int stride, const ncnn::Mat& pred,
                               const ncnn::Mat& pred_kps,
                               float prob_threshold, std::vector<Object>& objects);
static void qsort_descent_inplace(std::vector<Object>& objects);
static void nms_sorted_bboxes(const std::vector<Object>& objects,
                               std::vector<int>& picked, float nms_threshold,
                               bool agnostic = false);

Import

#include "layer.h"
#include "net.h"

I/O Contract

Inputs

Name	Type	Required	Description
image_path	const char*	Yes	Path to input image file

Outputs

Name	Type	Description
objects	std::vector<Object>	Detected persons with bounding boxes, confidence scores, and 17 keypoints each with (x, y, confidence)

Model Files

File	Description
yolov8n_pose.ncnn.param	YOLOv8-Pose nano model parameter file
yolov8n_pose.ncnn.bin	YOLOv8-Pose nano model weight file

Usage Examples

Running the Example

./yolov8_pose image.jpg

Key Code Pattern

ncnn::Net yolov8;
yolov8.opt.use_vulkan_compute = true;

yolov8.load_param("yolov8n_pose.ncnn.param");
yolov8.load_model("yolov8n_pose.ncnn.bin");

const int target_size = 640;
const float prob_threshold = 0.25f;
const float nms_threshold = 0.45f;

// Letterbox pad to 640x640
ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data,
    ncnn::Mat::PIXEL_BGR2RGB, img_w, img_h, w, h);

const float norm_vals[3] = {1 / 255.f, 1 / 255.f, 1 / 255.f};
in_pad.substract_mean_normalize(0, norm_vals);

ncnn::Extractor ex = yolov8.create_extractor();
ex.input("in0", in_pad);

ncnn::Mat out0;  // bbox + person score (w=65, h=8400)
ncnn::Mat out1;  // keypoints (w=51, h=8400)
ex.extract("out0", out0);
ex.extract("out1", out1);

Implementation Details

Preprocessing

Input images are resized while preserving aspect ratio and letterbox padded to 640x640 (a multiple of max_stride=32). Pixel values are converted from BGR to RGB and normalized by dividing by 255. The padding fill value is 114.

Output Tensor Layout

The model produces two output tensors:

out0 (w=65, h=8400): Contains DFL bbox regression (16x4=64 values) and 1 person confidence score for 8400 candidate boxes across three stride levels (8, 16, 32)
out1 (w=51, h=8400): Contains 17 keypoints x 3 values (x, y, confidence) per candidate box

Keypoint Skeleton

The 17 COCO keypoints are connected by skeleton edges for visualization: nose-eyes-ears form the head connections, shoulders-elbows-wrists form arm chains, and hips-knees-ankles form leg chains, with shoulder-hip connections forming the torso.

Model Conversion

Models are converted from Ultralytics format using PNNX. The conversion requires modifying reshape operations for dynamic shapes and re-exporting with dual input shapes (640x640 and 320x320).

Related Pages

Environment:Tencent_Ncnn_Build_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment