Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Webgpu shader lib

From Leeroopedia


Implementation Metadata
File Name src/ggml-webgpu/ggml-webgpu-shader-lib.hpp
Repository ggml-org/ggml
Lines 537
Language C++
Domain Tags GPU_Computing, Shader_Management, WebGPU
Status Active
Last Updated 2025-05-15 12:00 GMT
Knowledge Sources ggml-org/ggml repository

Overview

ggml-webgpu-shader-lib.hpp is the shader library for the WebGPU backend, providing pipeline key types, shader processing infrastructure, and workgroup configuration logic for all supported operations. It is the shader specialization layer that enables optimal GPU utilization by selecting appropriate shader variants based on operation parameters and hardware capabilities.

Description

The file defines pipeline key structs for each operation category with corresponding hash functions for use in unordered_maps. Each key captures the parameters that differentiate shader variants:

  • Flash Attention -- ggml_webgpu_flash_attn_pipeline_key encodes KV type, head dimensions (QK and V), KV direct access, mask presence, sink tokens, and logit softcap
  • Generic Operations -- ggml_webgpu_generic_shader_lib_context for standard operations
  • Pad, Argsort, Set-Rows, Unary, Binary -- Specialized key types for each operation category

The ggml_webgpu_processed_shader struct holds the processed WGSL code, variant name, and decision parameters. Decision structs encode runtime choices like tile sizes (q_tile, kv_tile), workgroup sizes, and subgroup matrix dimensions.

Key constants include:

  • GGML_WEBGPU_FLASH_ATTN_PREFERRED_KV_SG_TILES = 8
  • GGML_WEBGPU_FLASH_ATTN_PREFERRED_WG_SIZE = 128
  • GGML_WEBGPU_KV_SEQ_PAD = 256 (matches GGML_PAD in llama-context.cpp)
  • GGML_WEBGPU_ARGSORT_MERGE_MAX_WG_SIZE = 512

Usage

This library is used internally by ggml-webgpu.cpp to configure and cache shader pipelines.

#include "ggml-webgpu-shader-lib.hpp"

// Create a flash attention pipeline key
ggml_webgpu_flash_attn_pipeline_key key = {
    .kv_type = GGML_TYPE_F16,
    .head_dim_qk = 128,
    .head_dim_v = 128,
    .kv_direct = false,
    .has_mask = true,
};

// Compute workgroup memory requirements
size_t wg_mem = ggml_webgpu_flash_attn_wg_mem_bytes(q_tile, kv_tile,
    key.head_dim_qk, key.head_dim_v, key.has_mask, key.kv_direct);

Code Reference

Source Location

Repository File Lines
ggml-org/ggml src/ggml-webgpu/ggml-webgpu-shader-lib.hpp 537

Key Signatures

struct ggml_webgpu_processed_shader {
    std::string wgsl;
    std::string variant;
    void *      decisions;
};

struct ggml_webgpu_flash_attn_pipeline_key {
    ggml_type kv_type;
    uint32_t  head_dim_qk;
    uint32_t  head_dim_v;
    bool      kv_direct, has_mask, has_sinks, uses_logit_softcap;
};

struct ggml_webgpu_flash_attn_shader_decisions {
    uint32_t q_tile  = 0;
    uint32_t kv_tile = 0;
    uint32_t wg_size = 0;
};

inline size_t ggml_webgpu_flash_attn_wg_mem_bytes(uint32_t q_tile, uint32_t kv_tile,
    uint32_t head_dim_qk, uint32_t head_dim_v, bool has_mask, bool kv_direct);

template <typename T> inline void ggml_webgpu_hash_combine(size_t & seed, const T & value);

I/O Contract

Inputs

  • Pipeline keys -- Operation parameters that determine shader variant selection
  • Hardware capabilities -- Subgroup size, workgroup memory limits, max subgroup size

Outputs

  • Processed shaders -- WGSL code with appropriate macro substitutions and tiling decisions
  • Shader decisions -- Optimal tile sizes and workgroup configurations

Usage Examples

Hash-based pipeline caching:

// Pipeline keys are hashable for unordered_map caching
std::unordered_map<ggml_webgpu_flash_attn_pipeline_key,
    wgpu::ComputePipeline,
    ggml_webgpu_flash_attn_pipeline_key_hash> fa_pipeline_cache;

// Lookup or create pipeline for given key
auto it = fa_pipeline_cache.find(key);
if (it == fa_pipeline_cache.end()) {
    // Create and cache new pipeline variant
}

Related Pages

Implements Principle

Related Implementations

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment