Implementation:Ggml org Ggml Metal impl
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (API Doc) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, GPU_Computing |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
Shared header defining Metal kernel parameters, threadgroup configuration constants, function constant offsets, and argument structs used by both Metal shader code and C++ host code.
Description
ggml-metal-impl.h serves as the bridge between the host-side C++ dispatch code and the device-side Metal Shading Language (MSL) kernel code. It provides:
- Threadgroup configuration constants: Per-quantization-type parameters controlling GPU execution:
N_R0_*-- number of src0 rows processed per simdgroup (e.g.,N_R0_Q4_0 = 4,N_R0_Q8_0 = 2)N_SG_*-- number of simdgroups per threadgroup (e.g.,N_SG_Q4_0 = 2,N_SG_Q8_0 = 4)- Covers all quantization formats: Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, MXFP4, Q2_K through Q6_K, and IQ variants (IQ1_S, IQ1_M, IQ2_XXS, IQ2_XS, IQ2_S, IQ3_XXS, IQ3_S, IQ4_NL, IQ4_XS).
- Function constant offsets: Named index ranges for Metal function constants used by different kernel families:
FC_FLASH_ATTN_EXT_*(100-500),FC_MUL_MV(600),FC_MUL_MM(700),FC_ROPE(800),FC_SSM_CONV(900), and others. - Kernel argument structs: Packed C structs (
ggml_metal_kargs_*) defining the argument layouts passed to Metal compute shaders. Element counters useint32_tto reduce GPU register pressure, while strides useuint64_t. Structs exist for concat, binary ops, unary ops, mul_mv, mul_mm, flash attention, RoPE, softmax, and many more.
This header is included by both the .metal shader files and the C++ dispatch code, ensuring argument layout consistency across the host-device boundary.
Usage
This header is included internally by Metal backend source files. It is not intended for direct use by application code. Its definitions are consumed by ggml-metal-ops.cpp (dispatch), ggml-metal-device.cpp (pipeline management), and the Metal shader source files.
Code Reference
Source Location
GGML repo, file: src/ggml-metal/ggml-metal-impl.h (1001 lines).
Signatures
// Threadgroup configuration constants (examples):
#define N_R0_Q4_0 4
#define N_SG_Q4_0 2
// Function constant offsets:
#define FC_FLASH_ATTN_EXT 300
#define FC_MUL_MV 600
#define FC_MUL_MM 700
// Kernel argument struct example:
typedef struct {
int32_t ne00, ne01, ne02, ne03;
uint64_t nb00, nb01, nb02, nb03;
int32_t ne10, ne11, ne12, ne13;
uint64_t nb10, nb11, nb12, nb13;
int32_t ne0, ne1, ne2, ne3;
uint64_t nb0, nb1, nb2, nb3;
int32_t dim;
} ggml_metal_kargs_concat;
Import
#include "ggml-metal-impl.h"
I/O Contract
Inputs
This is a header-only file defining constants and type definitions. It does not have runtime inputs. The kernel argument structs are populated by host-side dispatch code with tensor shape metadata (element counts and byte strides) before each kernel launch.
Outputs
The defined structs and constants are consumed at compile time by both host C++ code and Metal shader code. At runtime, the populated ggml_metal_kargs_* structs are passed as kernel arguments to Metal compute shaders.
Usage Examples
// Host-side dispatch code populating a kernel argument struct:
ggml_metal_kargs_concat args = {
.ne00 = (int32_t) src0->ne[0],
.ne01 = (int32_t) src0->ne[1],
.ne02 = (int32_t) src0->ne[2],
.ne03 = (int32_t) src0->ne[3],
.nb00 = src0->nb[0],
.nb01 = src0->nb[1],
.nb02 = src0->nb[2],
.nb03 = src0->nb[3],
// ... additional fields ...
.dim = dim,
};
// Pass args to the Metal compute encoder
ggml_metal_encoder_set_bytes(enc, &args, sizeof(args), 0);