Implementation:Ggml org Ggml Metal impl

Metadata

Field	Value
Page Type	Implementation (API Doc)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, GPU_Computing
Last Updated	2025-05-15 12:00 GMT

Overview

Shared header defining Metal kernel parameters, threadgroup configuration constants, function constant offsets, and argument structs used by both Metal shader code and C++ host code.

Description

ggml-metal-impl.h serves as the bridge between the host-side C++ dispatch code and the device-side Metal Shading Language (MSL) kernel code. It provides:

Threadgroup configuration constants: Per-quantization-type parameters controlling GPU execution:
- N_R0_* -- number of src0 rows processed per simdgroup (e.g., N_R0_Q4_0 = 4, N_R0_Q8_0 = 2)
- N_SG_* -- number of simdgroups per threadgroup (e.g., N_SG_Q4_0 = 2, N_SG_Q8_0 = 4)
- Covers all quantization formats: Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, MXFP4, Q2_K through Q6_K, and IQ variants (IQ1_S, IQ1_M, IQ2_XXS, IQ2_XS, IQ2_S, IQ3_XXS, IQ3_S, IQ4_NL, IQ4_XS).
Function constant offsets: Named index ranges for Metal function constants used by different kernel families: FC_FLASH_ATTN_EXT_* (100-500), FC_MUL_MV (600), FC_MUL_MM (700), FC_ROPE (800), FC_SSM_CONV (900), and others.
Kernel argument structs: Packed C structs (ggml_metal_kargs_*) defining the argument layouts passed to Metal compute shaders. Element counters use int32_t to reduce GPU register pressure, while strides use uint64_t. Structs exist for concat, binary ops, unary ops, mul_mv, mul_mm, flash attention, RoPE, softmax, and many more.

This header is included by both the .metal shader files and the C++ dispatch code, ensuring argument layout consistency across the host-device boundary.

Usage

This header is included internally by Metal backend source files. It is not intended for direct use by application code. Its definitions are consumed by ggml-metal-ops.cpp (dispatch), ggml-metal-device.cpp (pipeline management), and the Metal shader source files.

Code Reference

Source Location

GGML repo, file: src/ggml-metal/ggml-metal-impl.h (1001 lines).

Signatures

// Threadgroup configuration constants (examples):
#define N_R0_Q4_0 4
#define N_SG_Q4_0 2

// Function constant offsets:
#define FC_FLASH_ATTN_EXT     300
#define FC_MUL_MV             600
#define FC_MUL_MM             700

// Kernel argument struct example:
typedef struct {
    int32_t  ne00, ne01, ne02, ne03;
    uint64_t nb00, nb01, nb02, nb03;
    int32_t  ne10, ne11, ne12, ne13;
    uint64_t nb10, nb11, nb12, nb13;
    int32_t  ne0, ne1, ne2, ne3;
    uint64_t nb0, nb1, nb2, nb3;
    int32_t  dim;
} ggml_metal_kargs_concat;

Import

#include "ggml-metal-impl.h"

I/O Contract

Inputs

This is a header-only file defining constants and type definitions. It does not have runtime inputs. The kernel argument structs are populated by host-side dispatch code with tensor shape metadata (element counts and byte strides) before each kernel launch.

Outputs

The defined structs and constants are consumed at compile time by both host C++ code and Metal shader code. At runtime, the populated ggml_metal_kargs_* structs are passed as kernel arguments to Metal compute shaders.

Usage Examples

// Host-side dispatch code populating a kernel argument struct:
ggml_metal_kargs_concat args = {
    .ne00 = (int32_t) src0->ne[0],
    .ne01 = (int32_t) src0->ne[1],
    .ne02 = (int32_t) src0->ne[2],
    .ne03 = (int32_t) src0->ne[3],
    .nb00 = src0->nb[0],
    .nb01 = src0->nb[1],
    .nb02 = src0->nb[2],
    .nb03 = src0->nb[3],
    // ... additional fields ...
    .dim  = dim,
};

// Pass args to the Metal compute encoder
ggml_metal_encoder_set_bytes(enc, &args, sizeof(args), 0);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment