Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Tencent Ncnn Vulkan Option And Allocator

From Leeroopedia


Knowledge Sources
Domains GPU_Computing, Memory_Management
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for configuring Vulkan GPU inference options and memory allocators provided by the ncnn library.

Description

The ncnn::Option class contains Vulkan-specific configuration flags accessed via net.opt. The key flag is use_vulkan_compute which enables GPU inference. Additional flags control precision (use_fp16_packed, use_fp16_storage, use_fp16_arithmetic), compute features (use_shader_local_memory, use_cooperative_matrix), and memory management (blob_vkallocator, staging_vkallocator).

VkBlobAllocator pools GPU memory in large blocks (default 16MB) for intermediate tensor storage. VkStagingAllocator manages host-visible memory for CPU↔GPU data transfer. VkWeightAllocator pools memory for model weights with smaller blocks (default 8MB).

Usage

Set opt.use_vulkan_compute = true on the Net object before loading the model. Optionally create and configure custom allocators for fine-grained memory control.

Code Reference

Source Location

  • Repository: ncnn
  • File: src/option.h (Option class), src/allocator.h (VkAllocator classes)
  • Lines: option.h:L17-155 (Option class, use_vulkan_compute at L84), allocator.h:L263-295 (VkAllocator base), allocator.h:L298-319 (VkBlobAllocator), allocator.h:L372-397 (VkStagingAllocator)

Signature

namespace ncnn {

class Option
{
public:
    Option();

    // Enable Vulkan GPU compute
    bool use_vulkan_compute;

    // Precision options for GPU
    bool use_fp16_packed;      // fp16 packed storage
    bool use_fp16_storage;     // fp16 weight storage
    bool use_fp16_arithmetic;  // fp16 compute operations

    // GPU compute optimizations
    bool use_shader_local_memory;  // local memory optimization
    bool use_cooperative_matrix;   // tensor core usage

    // Vulkan memory allocators
    VkAllocator* blob_vkallocator;      // GPU blob memory
    VkAllocator* workspace_vkallocator; // GPU workspace memory
    VkAllocator* staging_vkallocator;   // CPU-GPU staging memory

    // Pipeline cache
    PipelineCache* pipeline_cache;
};

class VkBlobAllocator : public VkAllocator
{
public:
    explicit VkBlobAllocator(const VulkanDevice* vkdev,
                             size_t preferred_block_size = 16 * 1024 * 1024);
    virtual void clear();
    virtual VkBufferMemory* fastMalloc(size_t size);
    virtual void fastFree(VkBufferMemory* ptr);
};

class VkStagingAllocator : public VkAllocator
{
public:
    explicit VkStagingAllocator(const VulkanDevice* vkdev);
    void set_size_compare_ratio(float scr);
    virtual void clear();
    virtual VkBufferMemory* fastMalloc(size_t size);
    virtual void fastFree(VkBufferMemory* ptr);
};

} // namespace ncnn

Import

#include "net.h"       // Option is included via net.h
#include "gpu.h"       // VulkanDevice
#include "allocator.h" // VkBlobAllocator, VkStagingAllocator

I/O Contract

Inputs

Name Type Required Description
use_vulkan_compute bool Yes Enable GPU inference (set to true)
use_fp16_packed bool No Enable fp16 packed element storage
use_fp16_storage bool No Enable fp16 weight storage
use_fp16_arithmetic bool No Enable fp16 compute
blob_vkallocator VkAllocator* No Custom GPU memory allocator
staging_vkallocator VkAllocator* No Custom staging allocator

Outputs

Name Type Description
net.opt ncnn::Option Configured option object controlling GPU inference behavior

Usage Examples

Basic Vulkan Configuration

#include "net.h"
#include "gpu.h"

ncnn::create_gpu_instance();

ncnn::Net net;
net.opt.use_vulkan_compute = true;
net.set_vulkan_device(0);

// Load model (options must be set before this)
net.load_param("model.param");
net.load_model("model.bin");

Advanced Configuration with Custom Allocators

#include "net.h"
#include "gpu.h"
#include "allocator.h"

ncnn::create_gpu_instance();
const ncnn::VulkanDevice* vkdev = ncnn::get_gpu_device(0);

// Create custom allocators
ncnn::VkBlobAllocator blob_alloc(vkdev, 32 * 1024 * 1024); // 32MB blocks
ncnn::VkStagingAllocator staging_alloc(vkdev);

ncnn::Net net;
net.opt.use_vulkan_compute = true;
net.opt.use_fp16_storage = true;
net.opt.use_fp16_arithmetic = true;
net.opt.blob_vkallocator = &blob_alloc;
net.opt.staging_vkallocator = &staging_alloc;
net.set_vulkan_device(vkdev);

net.load_param("model.param");
net.load_model("model.bin");

// Cleanup
ncnn::destroy_gpu_instance();

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment