Principle:Tencent Ncnn Vulkan GPU Inference

Knowledge Sources	ncnn Vulkan Compute
Domains	GPU_Computing, Inference
Last Updated	2026-02-09 00:00 GMT

Overview

Process of executing neural network inference on a Vulkan GPU, using the same API as CPU inference with automatic data transfer and compute shader dispatch.

Description

Vulkan GPU inference in ncnn uses the same Extractor::input / Extractor::extract API as CPU inference. When opt.use_vulkan_compute is enabled, ncnn automatically handles: (1) uploading input Mat data from CPU to GPU memory, (2) dispatching Vulkan compute shaders for each layer in the network, and (3) downloading output data back to CPU Mat tensors.

For advanced zero-copy GPU pipelines, VkMat variants of input and extract allow keeping data on the GPU between operations, avoiding costly CPU↔GPU transfers. The VkCompute class manages command buffer recording and submission for chained GPU operations.

The transparent CPU↔GPU fallback ensures that layers without Vulkan implementations gracefully fall back to CPU execution without user intervention.

Usage

Use the standard Extractor API after configuring Vulkan options. For basic usage, no code changes are needed beyond setting opt.use_vulkan_compute = true. For performance-critical pipelines, use VkMat variants to keep data on GPU.

Theoretical Basis

GPU inference execution flow:

Extractor::input(name, cpu_mat)
    → Upload: CPU Mat → GPU VkMat (via staging allocator)

For each layer in topological order:
    If layer has Vulkan implementation:
        → Record compute shader dispatch to command buffer
        → Execute on GPU
    Else:
        → Download to CPU, execute on CPU, upload back to GPU

Extractor::extract(name, cpu_mat)
    → Download: GPU VkMat → CPU Mat
    → Return result

VkMat zero-copy pipeline:

Extractor::input(name, vk_mat)     // Already on GPU — no upload
    → Execute all layers on GPU
Extractor::extract(name, vk_mat, cmd)  // Stay on GPU — no download
    → Chain into next GPU operation

Related Pages

Implemented By

Implementation:Tencent_Ncnn_Extractor_Vulkan_Compute

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment