Implementation:Ggml org Ggml Webgpu backend

**Implementation Metadata**
File Name	`src/ggml-webgpu/ggml-webgpu.cpp`
Repository	ggml-org/ggml
Lines	3452
Language	C++
Domain Tags	GPU_Computing, ML_Infrastructure, WebGPU
Status	Active
Last Updated	2025-05-15 12:00 GMT
Knowledge Sources	ggml-org/ggml repository

Overview

ggml-webgpu.cpp is the main implementation of the WebGPU backend, enabling GPU-accelerated ML inference in browsers and native applications via the WebGPU API. At 3,452 lines, it demonstrates the maturity of WebGPU support with optimized matrix multiply kernels and flash attention for practical LLM inference in web environments.

Description

The backend uses a fake base pointer (0x1000) since WebGPU buffers lack direct memory addresses, computing tensor offsets relative to this base. Key architectural features include:

Parameter buffer pool -- webgpu_buf_pool manages parameter buffers for passing kernel arguments, with mutex-based synchronization for multi-threaded access
Command batching -- Commands are batched (WEBGPU_COMMAND_SUBMIT_BATCH_SIZE=8) with in-flight submission limits per thread
Matrix multiplication -- Register tiling (8x8 tiles, 32-element K dimension), subgroup matrix configurations, and vector multiplication paths
Profiling -- CPU profiling via high-resolution timers and GPU profiling via timestamp queries

Key constants:

WEBGPU_MAX_WG_SIZE = 288
WEBGPU_MUL_MAT_WG_SIZE = 256
WEBGPU_PARAMS_BUF_SIZE_BYTES = 128 (32 parameters)
WEBGPU_ROW_SPLIT_WG_SIZE = 64

Supports Emscripten builds for browser deployment.

Usage

The WebGPU backend is loaded automatically when GGML_WEBGPU=1 is set during build:

#include "ggml-backend.h"
#include "ggml-webgpu.h"

int main(void) {
    ggml_backend_load_all();
    // WebGPU backend is registered if WebGPU device is available
    ggml_backend_t backend = ggml_backend_init_best();
    // ...
}

Code Reference

Source Location

Repository	File	Lines
ggml-org/ggml	`src/ggml-webgpu/ggml-webgpu.cpp`	3452

Key Signatures

// Buffer pool for parameter passing
struct webgpu_buf_pool { ... };

// Pipeline and command management
struct webgpu_pipeline { ... };
struct webgpu_command { ... };

// Device capabilities
struct webgpu_capabilities { ... };

// Context structures
struct webgpu_global_context_struct { ... };
struct webgpu_context_struct { ... };

// Matrix multiply tiling constants
#define WEBGPU_MUL_MAT_TILE_M    8
#define WEBGPU_MUL_MAT_TILE_N    8
#define WEBGPU_MUL_MAT_TILE_K    32
#define WEBGPU_MUL_MAT_WG_SIZE_M 8
#define WEBGPU_MUL_MAT_WG_SIZE_N 8

I/O Contract

Inputs

GGML compute graph -- Tensor operation graph dispatched through the backend interface
WebGPU device -- GPU device obtained via wgpu::Device
Tensor data -- Input tensors in WebGPU buffer storage

Outputs

Computed tensors -- Results in WebGPU buffers, readable via buffer mapping
Profiling data -- Optional CPU and GPU timing information

Usage Examples

WebGPU backend with browser deployment:

// Build with Emscripten for browser usage
// cmake -DGGML_WEBGPU=1 -DCMAKE_TOOLCHAIN_FILE=emsdk/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake

// Or build with Dawn for native usage
// cmake -DGGML_WEBGPU=1

#include "ggml-backend.h"
ggml_backend_load_all();
ggml_backend_t backend = ggml_backend_init_best();

Related Pages

Implements Principle

Principle:Ggml_org_Ggml_WebGPU_Computation

Related Implementations

Implementation:Ggml_org_Ggml_Webgpu_shader_lib -- Shader pipeline management
Implementation:Ggml_org_Ggml_Webgpu_wgsl_preprocessor -- WGSL preprocessing
Implementation:Ggml_org_Ggml_Backend_impl_interface -- Backend interface contract

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment