Principle:Tencent Ncnn Vulkan GPU Inference
| Knowledge Sources | |
|---|---|
| Domains | GPU_Computing, Inference |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Process of executing neural network inference on a Vulkan GPU, using the same API as CPU inference with automatic data transfer and compute shader dispatch.
Description
Vulkan GPU inference in ncnn uses the same Extractor::input / Extractor::extract API as CPU inference. When opt.use_vulkan_compute is enabled, ncnn automatically handles: (1) uploading input Mat data from CPU to GPU memory, (2) dispatching Vulkan compute shaders for each layer in the network, and (3) downloading output data back to CPU Mat tensors.
For advanced zero-copy GPU pipelines, VkMat variants of input and extract allow keeping data on the GPU between operations, avoiding costly CPU↔GPU transfers. The VkCompute class manages command buffer recording and submission for chained GPU operations.
The transparent CPU↔GPU fallback ensures that layers without Vulkan implementations gracefully fall back to CPU execution without user intervention.
Usage
Use the standard Extractor API after configuring Vulkan options. For basic usage, no code changes are needed beyond setting opt.use_vulkan_compute = true. For performance-critical pipelines, use VkMat variants to keep data on GPU.
Theoretical Basis
GPU inference execution flow:
Extractor::input(name, cpu_mat)
→ Upload: CPU Mat → GPU VkMat (via staging allocator)
For each layer in topological order:
If layer has Vulkan implementation:
→ Record compute shader dispatch to command buffer
→ Execute on GPU
Else:
→ Download to CPU, execute on CPU, upload back to GPU
Extractor::extract(name, cpu_mat)
→ Download: GPU VkMat → CPU Mat
→ Return result
VkMat zero-copy pipeline:
Extractor::input(name, vk_mat) // Already on GPU — no upload
→ Execute all layers on GPU
Extractor::extract(name, vk_mat, cmd) // Stay on GPU — no download
→ Chain into next GPU operation