Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Tencent Ncnn Pipeline Cache And Memory

From Leeroopedia


Knowledge Sources
Domains GPU_Computing, Performance_Optimization
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for caching compiled Vulkan compute pipelines and managing GPU memory pools for optimized inference provided by the ncnn library.

Description

The PipelineCache class caches compiled Vulkan compute pipelines using MurmurHash3-based keys. It stores shader modules, descriptor set layouts, pipeline layouts, and pipeline objects, avoiding expensive recompilation. The cache is populated on first use and subsequent requests for the same pipeline configuration return the cached version.

VkBlobAllocator manages GPU memory in pooled blocks (default 16MB), sub-allocating for individual tensors. VkCompute manages Vulkan command buffer recording with operations like record_upload, record_download, record_pipeline, and submit_and_wait for batched GPU execution.

Usage

Pipeline caching is automatic via net.opt.pipeline_cache. Custom memory allocators can be configured for fine-grained control over GPU memory pooling. Use VkCompute::submit_and_wait() to synchronize GPU operations when needed.

Code Reference

Source Location

  • Repository: ncnn
  • File: src/pipelinecache.h, src/pipelinecache.cpp, src/command.h, src/command.cpp
  • Lines: pipelinecache.h:L18-65 (PipelineCache class), pipelinecache.cpp:L1-498 (implementation with MurmurHash3), command.h:L22-86 (VkCompute class), command.h:L67 (submit_and_wait)

Signature

namespace ncnn {

class PipelineCache
{
public:
    explicit PipelineCache(const VulkanDevice* _vkdev);
    virtual ~PipelineCache();

    void clear();

    // Get or create a cached pipeline from SPIR-V data
    int get_pipeline(
        const uint32_t* spv_data, size_t spv_data_size,
        const std::vector<vk_specialization_type>& specializations,
        uint32_t local_size_x, uint32_t local_size_y, uint32_t local_size_z,
        uint32_t subgroup_size,
        VkShaderModule* shader_module,
        VkDescriptorSetLayout* descriptorset_layout,
        VkPipelineLayout* pipeline_layout,
        VkPipeline* pipeline,
        VkDescriptorUpdateTemplateKHR* descriptor_update_template,
        ShaderInfo& shader_info
    ) const;

    // Get or create from built-in shader index
    int get_pipeline(
        int shader_type_index, const Option& opt,
        const std::vector<vk_specialization_type>& specializations,
        uint32_t local_size_x, uint32_t local_size_y, uint32_t local_size_z,
        uint32_t subgroup_size,
        VkShaderModule* shader_module,
        VkDescriptorSetLayout* descriptorset_layout,
        VkPipelineLayout* pipeline_layout,
        VkPipeline* pipeline,
        VkDescriptorUpdateTemplateKHR* descriptor_update_template,
        ShaderInfo& shader_info
    ) const;
};

class VkCompute
{
public:
    explicit VkCompute(const VulkanDevice* vkdev);

    void record_upload(const Mat& src, VkMat& dst, const Option& opt);
    void record_download(const VkMat& src, Mat& dst, const Option& opt);
    void record_pipeline(const Pipeline* pipeline, ...);
    int submit_and_wait();
    int reset();
};

} // namespace ncnn

Import

#include "pipelinecache.h"
#include "command.h"
#include "gpu.h"

I/O Contract

Inputs

Name Type Required Description
vkdev const VulkanDevice* Yes Vulkan device for pipeline creation
spv_data const uint32_t* Yes SPIR-V shader bytecode
specializations std::vector<vk_specialization_type> Yes Shader specialization constants
local_size_x/y/z uint32_t Yes Workgroup dimensions

Outputs

Name Type Description
pipeline VkPipeline* Compiled (or cached) Vulkan compute pipeline
shader_module VkShaderModule* Compiled shader module
descriptor_update_template VkDescriptorUpdateTemplateKHR* Template for efficient descriptor updates

Usage Examples

Automatic Pipeline Caching (Default Usage)

// Pipeline caching is automatic when using Net
ncnn::Net net;
net.opt.use_vulkan_compute = true;
net.set_vulkan_device(0);

// First load_param triggers pipeline compilation (slow)
net.load_param("model.param");
net.load_model("model.bin");

// First inference: pipelines compiled and cached
ncnn::Extractor ex = net.create_extractor();
ex.input("data", in);
ex.extract("prob", out);  // Compiles + caches pipelines

// Subsequent inferences: pipelines served from cache (fast)
ncnn::Extractor ex2 = net.create_extractor();
ex2.input("data", in2);
ex2.extract("prob", out2);  // Uses cached pipelines

Memory Cleanup

// For long-running applications, periodically clear allocators
ncnn::VkBlobAllocator* blob_alloc =
    (ncnn::VkBlobAllocator*)net.opt.blob_vkallocator;
if (blob_alloc)
    blob_alloc->clear();  // Release all pooled GPU memory

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment