Implementation:Tencent Ncnn YOLOv8 Pose Example
| Knowledge Sources | |
|---|---|
| Domains | Vision, Pose_Estimation |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
Concrete tool for human pose estimation with 17 COCO keypoints using YOLOv8 with ncnn.
Description
This example implements YOLOv8 pose estimation using the ncnn inference framework, combining person detection and keypoint localization in a single forward pass. The model produces two output blobs: a detection blob (w=65, h=8400) containing DFL bbox regression (16x4=64 values) and a person confidence score, and a keypoint blob (w=51, h=8400) containing 17 COCO body keypoints with 3 values each (x coordinate, y coordinate, visibility confidence). Input images are preprocessed with letterbox padding to 640x640 resolution. The output visualization draws skeleton connections between anatomically linked keypoints on detected persons.
Usage
Use this example when you need to detect human poses using the YOLOv8 architecture for applications such as fitness tracking, gesture recognition, or action recognition on mobile and edge devices. This is the YOLOv8 predecessor to the YOLO11 pose estimation variant.
Code Reference
Source Location
- Repository: Tencent_Ncnn
- File: examples/yolov8_pose.cpp
- Lines: 1-550
Signature
struct KeyPoint
{
cv::Point2f p;
float prob;
};
struct Object
{
cv::Rect_<float> rect;
int label;
float prob;
std::vector<KeyPoint> keypoints;
};
static int detect_yolov8_pose(const cv::Mat& bgr, std::vector<Object>& objects);
static void generate_proposals(int stride, const ncnn::Mat& pred,
const ncnn::Mat& pred_kps,
float prob_threshold, std::vector<Object>& objects);
static void qsort_descent_inplace(std::vector<Object>& objects);
static void nms_sorted_bboxes(const std::vector<Object>& objects,
std::vector<int>& picked, float nms_threshold,
bool agnostic = false);
Import
#include "layer.h"
#include "net.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| image_path | const char* | Yes | Path to input image file |
Outputs
| Name | Type | Description |
|---|---|---|
| objects | std::vector<Object> | Detected persons with bounding boxes, confidence scores, and 17 keypoints each with (x, y, confidence) |
Model Files
| File | Description |
|---|---|
| yolov8n_pose.ncnn.param | YOLOv8-Pose nano model parameter file |
| yolov8n_pose.ncnn.bin | YOLOv8-Pose nano model weight file |
Usage Examples
Running the Example
./yolov8_pose image.jpg
Key Code Pattern
ncnn::Net yolov8;
yolov8.opt.use_vulkan_compute = true;
yolov8.load_param("yolov8n_pose.ncnn.param");
yolov8.load_model("yolov8n_pose.ncnn.bin");
const int target_size = 640;
const float prob_threshold = 0.25f;
const float nms_threshold = 0.45f;
// Letterbox pad to 640x640
ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data,
ncnn::Mat::PIXEL_BGR2RGB, img_w, img_h, w, h);
const float norm_vals[3] = {1 / 255.f, 1 / 255.f, 1 / 255.f};
in_pad.substract_mean_normalize(0, norm_vals);
ncnn::Extractor ex = yolov8.create_extractor();
ex.input("in0", in_pad);
ncnn::Mat out0; // bbox + person score (w=65, h=8400)
ncnn::Mat out1; // keypoints (w=51, h=8400)
ex.extract("out0", out0);
ex.extract("out1", out1);
Implementation Details
Preprocessing
Input images are resized while preserving aspect ratio and letterbox padded to 640x640 (a multiple of max_stride=32). Pixel values are converted from BGR to RGB and normalized by dividing by 255. The padding fill value is 114.
Output Tensor Layout
The model produces two output tensors:
- out0 (w=65, h=8400): Contains DFL bbox regression (16x4=64 values) and 1 person confidence score for 8400 candidate boxes across three stride levels (8, 16, 32)
- out1 (w=51, h=8400): Contains 17 keypoints x 3 values (x, y, confidence) per candidate box
Keypoint Skeleton
The 17 COCO keypoints are connected by skeleton edges for visualization: nose-eyes-ears form the head connections, shoulders-elbows-wrists form arm chains, and hips-knees-ankles form leg chains, with shoulder-hip connections forming the torso.
Model Conversion
Models are converted from Ultralytics format using PNNX. The conversion requires modifying reshape operations for dynamic shapes and re-exporting with dual input shapes (640x640 and 320x320).