Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Vulkan backend

From Leeroopedia
Revision as of 15:02, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ggml_org_Ggml_Vulkan_backend.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Metadata

Field Value
Page Type Implementation (API Doc)
Knowledge Sources GGML
Domains ML_Infrastructure, Tensor_Computing, GPU_Computing
Last Updated 2025-05-15 12:00 GMT

Overview

Main implementation of the Vulkan GPU backend for GGML, providing cross-platform GPU acceleration via the Vulkan compute API on any Vulkan-capable GPU.

Description

ggml-vulkan.cpp is the most portable and largest GPU backend in GGML at approximately 16,000 lines. It provides:

  1. Vulkan initialization: Uses vulkan.hpp (C++ Vulkan bindings) with a dynamic dispatch loader to avoid static linking to the Vulkan runtime. Includes polyfill definitions for VK_KHR_shader_bfloat16 for compatibility with older SDK versions.
  2. Pipeline management: The vk_pipeline_struct manages shader modules, pipeline layouts, push constant sizes, workgroup configurations, and supports lazy parallel compilation. Pipelines can have 64-bit indexing variants linked in a list.
  3. Vendor-specific optimizations: Detects GPU vendor via vendor IDs (VK_VENDOR_ID_AMD = 0x1002, VK_VENDOR_ID_APPLE = 0x106b, VK_VENDOR_ID_INTEL = 0x8086, VK_VENDOR_ID_NVIDIA = 0x10de) to enable vendor-specific code paths.
  4. Operation dispatch: Supports a comprehensive set of GGML operations via push constant structs (vk_mat_mat_push_constants, vk_flash_attn_push_constants, vk_op_rope_push_constants, etc.) that parameterize the compute shaders.
  5. Operation fusion: Supports fusing consecutive add operations (MAX_FUSED_ADDS derived from MAX_PARAMETER_COUNT = 12).
  6. Synchronization: Platform-specific yield intrinsics (_mm_pause on x86, __yield on ARM) for efficient spin-wait synchronization during GPU command submission.
  7. Memory management: Full buffer lifecycle including device memory, host-pinned memory, staging buffers for CPU-GPU transfers, and memory logging for debugging.
  8. Shader loading: Pre-compiled SPIR-V shaders are embedded via ggml-vulkan-shaders.hpp, generated by the vulkan-shaders-gen tool.

The backend runs on any Vulkan-capable GPU across Windows, Linux, macOS (via MoltenVK), and Android.

Usage

Users initialize the Vulkan backend by calling ggml_backend_vk_init(dev_num). The backend is typically discovered automatically by ggml_backend_load_all(). Multiple Vulkan devices can be used simultaneously (up to 16).

Code Reference

Source Location

GGML repo, file: src/ggml-vulkan/ggml-vulkan.cpp (16086 lines).

Signatures

ggml_backend_t ggml_backend_vk_init(size_t dev_num);
bool ggml_backend_is_vk(ggml_backend_t backend);
ggml_backend_buffer_type_t ggml_backend_vk_buffer_type(size_t dev_num);
ggml_backend_buffer_type_t ggml_backend_vk_host_buffer_type(void);
ggml_backend_reg_t ggml_backend_vk_reg(void);

Import

#include "ggml-vulkan.h"

I/O Contract

Inputs

Parameter Type Required Description
dev_num size_t Yes Vulkan device index (0-based). Selects which GPU to use when multiple Vulkan-capable devices are present.

Outputs

Output Type Description
Backend handle ggml_backend_t Opaque handle to the initialized Vulkan backend for use with the GGML scheduler.
Buffer type ggml_backend_buffer_type_t Buffer type for Vulkan device memory or host-pinned memory.
Registration handle ggml_backend_reg_t Backend registration for the auto-discovery system.

Usage Examples

#include "ggml-vulkan.h"
#include "ggml-backend.h"

// Query available devices
int n_devices = ggml_backend_vk_get_device_count();
char desc[256];
ggml_backend_vk_get_device_description(0, desc, sizeof(desc));

// Initialize the first Vulkan device
ggml_backend_t vk_backend = ggml_backend_vk_init(0);

if (vk_backend && ggml_backend_is_vk(vk_backend)) {
    // Query device memory
    size_t free_mem, total_mem;
    ggml_backend_vk_get_device_memory(0, &free_mem, &total_mem);

    // Use with scheduler
    ggml_backend_sched_t sched = ggml_backend_sched_new(
        &vk_backend, NULL, 1, GGML_DEFAULT_GRAPH_SIZE, false);

    ggml_backend_sched_graph_compute(sched, graph);

    ggml_backend_sched_free(sched);
    ggml_backend_free(vk_backend);
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment