Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Tencent Ncnn Vulkan GPU Inference

From Leeroopedia


Knowledge Sources
Domains GPU_Computing, Inference
Last Updated 2026-02-09 00:00 GMT

Overview

Process of executing neural network inference on a Vulkan GPU, using the same API as CPU inference with automatic data transfer and compute shader dispatch.

Description

Vulkan GPU inference in ncnn uses the same Extractor::input / Extractor::extract API as CPU inference. When opt.use_vulkan_compute is enabled, ncnn automatically handles: (1) uploading input Mat data from CPU to GPU memory, (2) dispatching Vulkan compute shaders for each layer in the network, and (3) downloading output data back to CPU Mat tensors.

For advanced zero-copy GPU pipelines, VkMat variants of input and extract allow keeping data on the GPU between operations, avoiding costly CPU↔GPU transfers. The VkCompute class manages command buffer recording and submission for chained GPU operations.

The transparent CPU↔GPU fallback ensures that layers without Vulkan implementations gracefully fall back to CPU execution without user intervention.

Usage

Use the standard Extractor API after configuring Vulkan options. For basic usage, no code changes are needed beyond setting opt.use_vulkan_compute = true. For performance-critical pipelines, use VkMat variants to keep data on GPU.

Theoretical Basis

GPU inference execution flow:

Extractor::input(name, cpu_mat)
    → Upload: CPU Mat → GPU VkMat (via staging allocator)

For each layer in topological order:
    If layer has Vulkan implementation:
        → Record compute shader dispatch to command buffer
        → Execute on GPU
    Else:
        → Download to CPU, execute on CPU, upload back to GPU

Extractor::extract(name, cpu_mat)
    → Download: GPU VkMat → CPU Mat
    → Return result

VkMat zero-copy pipeline:

Extractor::input(name, vk_mat)     // Already on GPU — no upload
    → Execute all layers on GPU
Extractor::extract(name, vk_mat, cmd)  // Stay on GPU — no download
    → Chain into next GPU operation

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment