Implementation:Tencent Ncnn Pipeline Cache And Memory
| Knowledge Sources | |
|---|---|
| Domains | GPU_Computing, Performance_Optimization |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for caching compiled Vulkan compute pipelines and managing GPU memory pools for optimized inference provided by the ncnn library.
Description
The PipelineCache class caches compiled Vulkan compute pipelines using MurmurHash3-based keys. It stores shader modules, descriptor set layouts, pipeline layouts, and pipeline objects, avoiding expensive recompilation. The cache is populated on first use and subsequent requests for the same pipeline configuration return the cached version.
VkBlobAllocator manages GPU memory in pooled blocks (default 16MB), sub-allocating for individual tensors. VkCompute manages Vulkan command buffer recording with operations like record_upload, record_download, record_pipeline, and submit_and_wait for batched GPU execution.
Usage
Pipeline caching is automatic via net.opt.pipeline_cache. Custom memory allocators can be configured for fine-grained control over GPU memory pooling. Use VkCompute::submit_and_wait() to synchronize GPU operations when needed.
Code Reference
Source Location
- Repository: ncnn
- File: src/pipelinecache.h, src/pipelinecache.cpp, src/command.h, src/command.cpp
- Lines: pipelinecache.h:L18-65 (PipelineCache class), pipelinecache.cpp:L1-498 (implementation with MurmurHash3), command.h:L22-86 (VkCompute class), command.h:L67 (submit_and_wait)
Signature
namespace ncnn {
class PipelineCache
{
public:
explicit PipelineCache(const VulkanDevice* _vkdev);
virtual ~PipelineCache();
void clear();
// Get or create a cached pipeline from SPIR-V data
int get_pipeline(
const uint32_t* spv_data, size_t spv_data_size,
const std::vector<vk_specialization_type>& specializations,
uint32_t local_size_x, uint32_t local_size_y, uint32_t local_size_z,
uint32_t subgroup_size,
VkShaderModule* shader_module,
VkDescriptorSetLayout* descriptorset_layout,
VkPipelineLayout* pipeline_layout,
VkPipeline* pipeline,
VkDescriptorUpdateTemplateKHR* descriptor_update_template,
ShaderInfo& shader_info
) const;
// Get or create from built-in shader index
int get_pipeline(
int shader_type_index, const Option& opt,
const std::vector<vk_specialization_type>& specializations,
uint32_t local_size_x, uint32_t local_size_y, uint32_t local_size_z,
uint32_t subgroup_size,
VkShaderModule* shader_module,
VkDescriptorSetLayout* descriptorset_layout,
VkPipelineLayout* pipeline_layout,
VkPipeline* pipeline,
VkDescriptorUpdateTemplateKHR* descriptor_update_template,
ShaderInfo& shader_info
) const;
};
class VkCompute
{
public:
explicit VkCompute(const VulkanDevice* vkdev);
void record_upload(const Mat& src, VkMat& dst, const Option& opt);
void record_download(const VkMat& src, Mat& dst, const Option& opt);
void record_pipeline(const Pipeline* pipeline, ...);
int submit_and_wait();
int reset();
};
} // namespace ncnn
Import
#include "pipelinecache.h"
#include "command.h"
#include "gpu.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| vkdev | const VulkanDevice* | Yes | Vulkan device for pipeline creation |
| spv_data | const uint32_t* | Yes | SPIR-V shader bytecode |
| specializations | std::vector<vk_specialization_type> | Yes | Shader specialization constants |
| local_size_x/y/z | uint32_t | Yes | Workgroup dimensions |
Outputs
| Name | Type | Description |
|---|---|---|
| pipeline | VkPipeline* | Compiled (or cached) Vulkan compute pipeline |
| shader_module | VkShaderModule* | Compiled shader module |
| descriptor_update_template | VkDescriptorUpdateTemplateKHR* | Template for efficient descriptor updates |
Usage Examples
Automatic Pipeline Caching (Default Usage)
// Pipeline caching is automatic when using Net
ncnn::Net net;
net.opt.use_vulkan_compute = true;
net.set_vulkan_device(0);
// First load_param triggers pipeline compilation (slow)
net.load_param("model.param");
net.load_model("model.bin");
// First inference: pipelines compiled and cached
ncnn::Extractor ex = net.create_extractor();
ex.input("data", in);
ex.extract("prob", out); // Compiles + caches pipelines
// Subsequent inferences: pipelines served from cache (fast)
ncnn::Extractor ex2 = net.create_extractor();
ex2.input("data", in2);
ex2.extract("prob", out2); // Uses cached pipelines
Memory Cleanup
// For long-running applications, periodically clear allocators
ncnn::VkBlobAllocator* blob_alloc =
(ncnn::VkBlobAllocator*)net.opt.blob_vkallocator;
if (blob_alloc)
blob_alloc->clear(); // Release all pooled GPU memory