Implementation:Tencent Ncnn Pipeline Cache And Memory

Knowledge Sources	ncnn
Domains	GPU_Computing, Performance_Optimization
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for caching compiled Vulkan compute pipelines and managing GPU memory pools for optimized inference provided by the ncnn library.

Description

The PipelineCache class caches compiled Vulkan compute pipelines using MurmurHash3-based keys. It stores shader modules, descriptor set layouts, pipeline layouts, and pipeline objects, avoiding expensive recompilation. The cache is populated on first use and subsequent requests for the same pipeline configuration return the cached version.

VkBlobAllocator manages GPU memory in pooled blocks (default 16MB), sub-allocating for individual tensors. VkCompute manages Vulkan command buffer recording with operations like record_upload, record_download, record_pipeline, and submit_and_wait for batched GPU execution.

Usage

Pipeline caching is automatic via net.opt.pipeline_cache. Custom memory allocators can be configured for fine-grained control over GPU memory pooling. Use VkCompute::submit_and_wait() to synchronize GPU operations when needed.

Code Reference

Source Location

Repository: ncnn
File: src/pipelinecache.h, src/pipelinecache.cpp, src/command.h, src/command.cpp
Lines: pipelinecache.h:L18-65 (PipelineCache class), pipelinecache.cpp:L1-498 (implementation with MurmurHash3), command.h:L22-86 (VkCompute class), command.h:L67 (submit_and_wait)

Signature

namespace ncnn {

class PipelineCache
{
public:
    explicit PipelineCache(const VulkanDevice* _vkdev);
    virtual ~PipelineCache();

    void clear();

    // Get or create a cached pipeline from SPIR-V data
    int get_pipeline(
        const uint32_t* spv_data, size_t spv_data_size,
        const std::vector<vk_specialization_type>& specializations,
        uint32_t local_size_x, uint32_t local_size_y, uint32_t local_size_z,
        uint32_t subgroup_size,
        VkShaderModule* shader_module,
        VkDescriptorSetLayout* descriptorset_layout,
        VkPipelineLayout* pipeline_layout,
        VkPipeline* pipeline,
        VkDescriptorUpdateTemplateKHR* descriptor_update_template,
        ShaderInfo& shader_info
    ) const;

    // Get or create from built-in shader index
    int get_pipeline(
        int shader_type_index, const Option& opt,
        const std::vector<vk_specialization_type>& specializations,
        uint32_t local_size_x, uint32_t local_size_y, uint32_t local_size_z,
        uint32_t subgroup_size,
        VkShaderModule* shader_module,
        VkDescriptorSetLayout* descriptorset_layout,
        VkPipelineLayout* pipeline_layout,
        VkPipeline* pipeline,
        VkDescriptorUpdateTemplateKHR* descriptor_update_template,
        ShaderInfo& shader_info
    ) const;
};

class VkCompute
{
public:
    explicit VkCompute(const VulkanDevice* vkdev);

    void record_upload(const Mat& src, VkMat& dst, const Option& opt);
    void record_download(const VkMat& src, Mat& dst, const Option& opt);
    void record_pipeline(const Pipeline* pipeline, ...);
    int submit_and_wait();
    int reset();
};

} // namespace ncnn

Import

#include "pipelinecache.h"
#include "command.h"
#include "gpu.h"

I/O Contract

Inputs

Name	Type	Required	Description
vkdev	const VulkanDevice*	Yes	Vulkan device for pipeline creation
spv_data	const uint32_t*	Yes	SPIR-V shader bytecode
specializations	std::vector<vk_specialization_type>	Yes	Shader specialization constants
local_size_x/y/z	uint32_t	Yes	Workgroup dimensions

Outputs

Name	Type	Description
pipeline	VkPipeline*	Compiled (or cached) Vulkan compute pipeline
shader_module	VkShaderModule*	Compiled shader module
descriptor_update_template	VkDescriptorUpdateTemplateKHR*	Template for efficient descriptor updates

Usage Examples

Automatic Pipeline Caching (Default Usage)

// Pipeline caching is automatic when using Net
ncnn::Net net;
net.opt.use_vulkan_compute = true;
net.set_vulkan_device(0);

// First load_param triggers pipeline compilation (slow)
net.load_param("model.param");
net.load_model("model.bin");

// First inference: pipelines compiled and cached
ncnn::Extractor ex = net.create_extractor();
ex.input("data", in);
ex.extract("prob", out);  // Compiles + caches pipelines

// Subsequent inferences: pipelines served from cache (fast)
ncnn::Extractor ex2 = net.create_extractor();
ex2.input("data", in2);
ex2.extract("prob", out2);  // Uses cached pipelines

Memory Cleanup

// For long-running applications, periodically clear allocators
ncnn::VkBlobAllocator* blob_alloc =
    (ncnn::VkBlobAllocator*)net.opt.blob_vkallocator;
if (blob_alloc)
    blob_alloc->clear();  // Release all pooled GPU memory

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment