Implementation:Ggml org Ggml Webgpu backend
| File Name | src/ggml-webgpu/ggml-webgpu.cpp
|
| Repository | ggml-org/ggml |
| Lines | 3452 |
| Language | C++ |
| Domain Tags | GPU_Computing, ML_Infrastructure, WebGPU |
| Status | Active |
| Last Updated | 2025-05-15 12:00 GMT |
| Knowledge Sources | ggml-org/ggml repository |
Overview
ggml-webgpu.cpp is the main implementation of the WebGPU backend, enabling GPU-accelerated ML inference in browsers and native applications via the WebGPU API. At 3,452 lines, it demonstrates the maturity of WebGPU support with optimized matrix multiply kernels and flash attention for practical LLM inference in web environments.
Description
The backend uses a fake base pointer (0x1000) since WebGPU buffers lack direct memory addresses, computing tensor offsets relative to this base. Key architectural features include:
- Parameter buffer pool --
webgpu_buf_poolmanages parameter buffers for passing kernel arguments, with mutex-based synchronization for multi-threaded access - Command batching -- Commands are batched (
WEBGPU_COMMAND_SUBMIT_BATCH_SIZE=8) with in-flight submission limits per thread - Matrix multiplication -- Register tiling (8x8 tiles, 32-element K dimension), subgroup matrix configurations, and vector multiplication paths
- Profiling -- CPU profiling via high-resolution timers and GPU profiling via timestamp queries
Key constants:
WEBGPU_MAX_WG_SIZE = 288WEBGPU_MUL_MAT_WG_SIZE = 256WEBGPU_PARAMS_BUF_SIZE_BYTES = 128(32 parameters)WEBGPU_ROW_SPLIT_WG_SIZE = 64
Supports Emscripten builds for browser deployment.
Usage
The WebGPU backend is loaded automatically when GGML_WEBGPU=1 is set during build:
#include "ggml-backend.h"
#include "ggml-webgpu.h"
int main(void) {
ggml_backend_load_all();
// WebGPU backend is registered if WebGPU device is available
ggml_backend_t backend = ggml_backend_init_best();
// ...
}
Code Reference
Source Location
| Repository | File | Lines |
|---|---|---|
| ggml-org/ggml | src/ggml-webgpu/ggml-webgpu.cpp |
3452 |
Key Signatures
// Buffer pool for parameter passing
struct webgpu_buf_pool { ... };
// Pipeline and command management
struct webgpu_pipeline { ... };
struct webgpu_command { ... };
// Device capabilities
struct webgpu_capabilities { ... };
// Context structures
struct webgpu_global_context_struct { ... };
struct webgpu_context_struct { ... };
// Matrix multiply tiling constants
#define WEBGPU_MUL_MAT_TILE_M 8
#define WEBGPU_MUL_MAT_TILE_N 8
#define WEBGPU_MUL_MAT_TILE_K 32
#define WEBGPU_MUL_MAT_WG_SIZE_M 8
#define WEBGPU_MUL_MAT_WG_SIZE_N 8
I/O Contract
Inputs
- GGML compute graph -- Tensor operation graph dispatched through the backend interface
- WebGPU device -- GPU device obtained via
wgpu::Device - Tensor data -- Input tensors in WebGPU buffer storage
Outputs
- Computed tensors -- Results in WebGPU buffers, readable via buffer mapping
- Profiling data -- Optional CPU and GPU timing information
Usage Examples
WebGPU backend with browser deployment:
// Build with Emscripten for browser usage // cmake -DGGML_WEBGPU=1 -DCMAKE_TOOLCHAIN_FILE=emsdk/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake // Or build with Dawn for native usage // cmake -DGGML_WEBGPU=1 #include "ggml-backend.h" ggml_backend_load_all(); ggml_backend_t backend = ggml_backend_init_best();
Related Pages
Implements Principle
Related Implementations
- Implementation:Ggml_org_Ggml_Webgpu_shader_lib -- Shader pipeline management
- Implementation:Ggml_org_Ggml_Webgpu_wgsl_preprocessor -- WGSL preprocessing
- Implementation:Ggml_org_Ggml_Backend_impl_interface -- Backend interface contract