Implementation:Tencent Ncnn Extractor Vulkan Compute
| Knowledge Sources | |
|---|---|
| Domains | GPU_Computing, Inference |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for executing Vulkan GPU-accelerated inference using the ncnn Extractor API with automatic CPU↔GPU data management.
Description
When use_vulkan_compute is enabled, the standard Extractor::input and Extractor::extract methods automatically manage CPU↔GPU data transfers. The implementation records upload/download operations and compute shader dispatches into Vulkan command buffers (VkCompute), then submits them to the GPU queue.
For advanced usage, VkMat overloads of input/extract allow zero-copy GPU pipelines where tensors remain on GPU memory. The VkCompute object can be used to chain multiple inference operations without intermediate CPU round-trips.
The Vulkan execution path is implemented in net.cpp (L2596-2679) alongside the standard CPU path. The runtime automatically dispatches to GPU layers when available, falling back to CPU for unsupported operations.
Usage
Use the standard Mat-based API for simple GPU inference (automatic transfers). Use VkMat-based API for performance-critical pipelines requiring zero-copy GPU execution.
Code Reference
Source Location
- Repository: ncnn
- File: src/net.h (Extractor Vulkan overloads), src/net.cpp (implementation), src/command.h (VkCompute)
- Lines: net.h:L222-240 (VkMat input/extract overloads), net.cpp:L2368-2596 (CPU input/extract), net.cpp:L2596-2679 (Vulkan VkMat overloads), command.h:L22-86 (VkCompute class)
Signature
class Extractor
{
public:
// Standard API — auto CPU↔GPU transfer when Vulkan enabled
int input(const char* blob_name, const Mat& in);
int extract(const char* blob_name, Mat& feat, int type = 0);
// Vulkan zero-copy API (NCNN_VULKAN only)
int input(const char* blob_name, const VkMat& in);
int extract(const char* blob_name, VkMat& feat, VkCompute& cmd);
// Vulkan allocator configuration
void set_blob_vkallocator(VkAllocator* allocator);
void set_workspace_vkallocator(VkAllocator* allocator);
void set_staging_vkallocator(VkAllocator* allocator);
};
// Command buffer for GPU operations
class VkCompute
{
public:
explicit VkCompute(const VulkanDevice* vkdev);
void record_upload(const Mat& src, VkMat& dst, const Option& opt);
void record_download(const VkMat& src, Mat& dst, const Option& opt);
void record_pipeline(const Pipeline* pipeline, ...);
int submit_and_wait();
int reset();
};
Import
#include "net.h"
#include "gpu.h"
#include "command.h" // for VkCompute (zero-copy API only)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| blob_name | const char* | Yes | Named blob from .param file |
| in (Mat) | const ncnn::Mat& | Yes (standard) | CPU tensor — auto-uploaded to GPU |
| in (VkMat) | const ncnn::VkMat& | Yes (zero-copy) | GPU tensor — stays on GPU |
Outputs
| Name | Type | Description |
|---|---|---|
| feat (Mat) | ncnn::Mat& | CPU tensor — auto-downloaded from GPU |
| feat (VkMat) | ncnn::VkMat& | GPU tensor — stays on GPU for chaining |
| return value | int | 0 on success |
Usage Examples
Standard GPU Inference (Transparent)
#include "net.h"
#include "gpu.h"
ncnn::create_gpu_instance();
ncnn::Net net;
net.opt.use_vulkan_compute = true;
net.set_vulkan_device(0);
net.load_param("model.param");
net.load_model("model.bin");
// Same API as CPU — transfers happen automatically
ncnn::Mat in = ncnn::Mat::from_pixels_resize(
bgr.data, ncnn::Mat::PIXEL_BGR, bgr.cols, bgr.rows, 224, 224);
ncnn::Extractor ex = net.create_extractor();
ex.input("data", in); // auto-uploaded to GPU
ncnn::Mat out;
ex.extract("prob", out); // auto-downloaded from GPU
ncnn::destroy_gpu_instance();
Zero-Copy VkMat Pipeline
#include "net.h"
#include "gpu.h"
#include "command.h"
const ncnn::VulkanDevice* vkdev = ncnn::get_gpu_device(0);
ncnn::VkCompute cmd(vkdev);
// Upload to GPU once
ncnn::VkMat vk_in;
cmd.record_upload(cpu_in, vk_in, net.opt);
cmd.submit_and_wait();
cmd.reset();
// Run inference entirely on GPU
ncnn::Extractor ex = net.create_extractor();
ex.input("data", vk_in);
ncnn::VkMat vk_out;
ex.extract("prob", vk_out, cmd);
cmd.submit_and_wait();
// Download only when needed
ncnn::Mat cpu_out;
cmd.record_download(vk_out, cpu_out, net.opt);
cmd.submit_and_wait();