Implementation:Tencent Ncnn Extractor Vulkan Compute

Knowledge Sources	ncnn
Domains	GPU_Computing, Inference
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for executing Vulkan GPU-accelerated inference using the ncnn Extractor API with automatic CPU↔GPU data management.

Description

When use_vulkan_compute is enabled, the standard Extractor::input and Extractor::extract methods automatically manage CPU↔GPU data transfers. The implementation records upload/download operations and compute shader dispatches into Vulkan command buffers (VkCompute), then submits them to the GPU queue.

For advanced usage, VkMat overloads of input/extract allow zero-copy GPU pipelines where tensors remain on GPU memory. The VkCompute object can be used to chain multiple inference operations without intermediate CPU round-trips.

The Vulkan execution path is implemented in net.cpp (L2596-2679) alongside the standard CPU path. The runtime automatically dispatches to GPU layers when available, falling back to CPU for unsupported operations.

Usage

Use the standard Mat-based API for simple GPU inference (automatic transfers). Use VkMat-based API for performance-critical pipelines requiring zero-copy GPU execution.

Code Reference

Source Location

Repository: ncnn
File: src/net.h (Extractor Vulkan overloads), src/net.cpp (implementation), src/command.h (VkCompute)
Lines: net.h:L222-240 (VkMat input/extract overloads), net.cpp:L2368-2596 (CPU input/extract), net.cpp:L2596-2679 (Vulkan VkMat overloads), command.h:L22-86 (VkCompute class)

Signature

class Extractor
{
public:
    // Standard API — auto CPU↔GPU transfer when Vulkan enabled
    int input(const char* blob_name, const Mat& in);
    int extract(const char* blob_name, Mat& feat, int type = 0);

    // Vulkan zero-copy API (NCNN_VULKAN only)
    int input(const char* blob_name, const VkMat& in);
    int extract(const char* blob_name, VkMat& feat, VkCompute& cmd);

    // Vulkan allocator configuration
    void set_blob_vkallocator(VkAllocator* allocator);
    void set_workspace_vkallocator(VkAllocator* allocator);
    void set_staging_vkallocator(VkAllocator* allocator);
};

// Command buffer for GPU operations
class VkCompute
{
public:
    explicit VkCompute(const VulkanDevice* vkdev);

    void record_upload(const Mat& src, VkMat& dst, const Option& opt);
    void record_download(const VkMat& src, Mat& dst, const Option& opt);
    void record_pipeline(const Pipeline* pipeline, ...);

    int submit_and_wait();
    int reset();
};

Import

#include "net.h"
#include "gpu.h"
#include "command.h"  // for VkCompute (zero-copy API only)

I/O Contract

Inputs

Name	Type	Required	Description
blob_name	const char*	Yes	Named blob from .param file
in (Mat)	const ncnn::Mat&	Yes (standard)	CPU tensor — auto-uploaded to GPU
in (VkMat)	const ncnn::VkMat&	Yes (zero-copy)	GPU tensor — stays on GPU

Outputs

Name	Type	Description
feat (Mat)	ncnn::Mat&	CPU tensor — auto-downloaded from GPU
feat (VkMat)	ncnn::VkMat&	GPU tensor — stays on GPU for chaining
return value	int	0 on success

Usage Examples

Standard GPU Inference (Transparent)

#include "net.h"
#include "gpu.h"

ncnn::create_gpu_instance();

ncnn::Net net;
net.opt.use_vulkan_compute = true;
net.set_vulkan_device(0);
net.load_param("model.param");
net.load_model("model.bin");

// Same API as CPU — transfers happen automatically
ncnn::Mat in = ncnn::Mat::from_pixels_resize(
    bgr.data, ncnn::Mat::PIXEL_BGR, bgr.cols, bgr.rows, 224, 224);

ncnn::Extractor ex = net.create_extractor();
ex.input("data", in);   // auto-uploaded to GPU

ncnn::Mat out;
ex.extract("prob", out); // auto-downloaded from GPU

ncnn::destroy_gpu_instance();

Zero-Copy VkMat Pipeline

#include "net.h"
#include "gpu.h"
#include "command.h"

const ncnn::VulkanDevice* vkdev = ncnn::get_gpu_device(0);
ncnn::VkCompute cmd(vkdev);

// Upload to GPU once
ncnn::VkMat vk_in;
cmd.record_upload(cpu_in, vk_in, net.opt);
cmd.submit_and_wait();
cmd.reset();

// Run inference entirely on GPU
ncnn::Extractor ex = net.create_extractor();
ex.input("data", vk_in);

ncnn::VkMat vk_out;
ex.extract("prob", vk_out, cmd);
cmd.submit_and_wait();

// Download only when needed
ncnn::Mat cpu_out;
cmd.record_download(vk_out, cpu_out, net.opt);
cmd.submit_and_wait();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment