Implementation:Ggml org Ggml Webgpu shader lib

**Implementation Metadata**
File Name	`src/ggml-webgpu/ggml-webgpu-shader-lib.hpp`
Repository	ggml-org/ggml
Lines	537
Language	C++
Domain Tags	GPU_Computing, Shader_Management, WebGPU
Status	Active
Last Updated	2025-05-15 12:00 GMT
Knowledge Sources	ggml-org/ggml repository

Overview

ggml-webgpu-shader-lib.hpp is the shader library for the WebGPU backend, providing pipeline key types, shader processing infrastructure, and workgroup configuration logic for all supported operations. It is the shader specialization layer that enables optimal GPU utilization by selecting appropriate shader variants based on operation parameters and hardware capabilities.

Description

The file defines pipeline key structs for each operation category with corresponding hash functions for use in unordered_maps. Each key captures the parameters that differentiate shader variants:

Flash Attention -- ggml_webgpu_flash_attn_pipeline_key encodes KV type, head dimensions (QK and V), KV direct access, mask presence, sink tokens, and logit softcap
Generic Operations -- ggml_webgpu_generic_shader_lib_context for standard operations
Pad, Argsort, Set-Rows, Unary, Binary -- Specialized key types for each operation category

The ggml_webgpu_processed_shader struct holds the processed WGSL code, variant name, and decision parameters. Decision structs encode runtime choices like tile sizes (q_tile, kv_tile), workgroup sizes, and subgroup matrix dimensions.

Key constants include:

GGML_WEBGPU_FLASH_ATTN_PREFERRED_KV_SG_TILES = 8
GGML_WEBGPU_FLASH_ATTN_PREFERRED_WG_SIZE = 128
GGML_WEBGPU_KV_SEQ_PAD = 256 (matches GGML_PAD in llama-context.cpp)
GGML_WEBGPU_ARGSORT_MERGE_MAX_WG_SIZE = 512

Usage

This library is used internally by ggml-webgpu.cpp to configure and cache shader pipelines.

#include "ggml-webgpu-shader-lib.hpp"

// Create a flash attention pipeline key
ggml_webgpu_flash_attn_pipeline_key key = {
    .kv_type = GGML_TYPE_F16,
    .head_dim_qk = 128,
    .head_dim_v = 128,
    .kv_direct = false,
    .has_mask = true,
};

// Compute workgroup memory requirements
size_t wg_mem = ggml_webgpu_flash_attn_wg_mem_bytes(q_tile, kv_tile,
    key.head_dim_qk, key.head_dim_v, key.has_mask, key.kv_direct);

Code Reference

Source Location

Repository	File	Lines
ggml-org/ggml	`src/ggml-webgpu/ggml-webgpu-shader-lib.hpp`	537

Key Signatures

struct ggml_webgpu_processed_shader {
    std::string wgsl;
    std::string variant;
    void *      decisions;
};

struct ggml_webgpu_flash_attn_pipeline_key {
    ggml_type kv_type;
    uint32_t  head_dim_qk;
    uint32_t  head_dim_v;
    bool      kv_direct, has_mask, has_sinks, uses_logit_softcap;
};

struct ggml_webgpu_flash_attn_shader_decisions {
    uint32_t q_tile  = 0;
    uint32_t kv_tile = 0;
    uint32_t wg_size = 0;
};

inline size_t ggml_webgpu_flash_attn_wg_mem_bytes(uint32_t q_tile, uint32_t kv_tile,
    uint32_t head_dim_qk, uint32_t head_dim_v, bool has_mask, bool kv_direct);

template <typename T> inline void ggml_webgpu_hash_combine(size_t & seed, const T & value);

I/O Contract

Inputs

Pipeline keys -- Operation parameters that determine shader variant selection
Hardware capabilities -- Subgroup size, workgroup memory limits, max subgroup size

Outputs

Processed shaders -- WGSL code with appropriate macro substitutions and tiling decisions
Shader decisions -- Optimal tile sizes and workgroup configurations

Usage Examples

Hash-based pipeline caching:

// Pipeline keys are hashable for unordered_map caching
std::unordered_map<ggml_webgpu_flash_attn_pipeline_key,
    wgpu::ComputePipeline,
    ggml_webgpu_flash_attn_pipeline_key_hash> fa_pipeline_cache;

// Lookup or create pipeline for given key
auto it = fa_pipeline_cache.find(key);
if (it == fa_pipeline_cache.end()) {
    // Create and cache new pipeline variant
}

Related Pages

Implements Principle

Principle:Ggml_org_Ggml_WebGPU_Computation

Related Implementations

Implementation:Ggml_org_Ggml_Webgpu_backend -- Main backend using this shader library
Implementation:Ggml_org_Ggml_Webgpu_wgsl_preprocessor -- WGSL preprocessor for shader compilation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment