Implementation:Ggml org Ggml Metal device
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (API Doc) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, GPU_Computing |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
Implements Metal device management, pipeline caching, and kernel pipeline lookup/compilation for all supported GGML operations on Apple GPUs.
Description
ggml-metal-device.cpp is the central component managing the Metal compute pipeline lifecycle. It provides:
- Device management: A static vector of
unique_ptrdevices with RAII cleanup viaggml_metal_device_deleter. Theggml_metal_device_getfunction lazily initializes and caches Metal device handles. - Pipeline caching: The
ggml_metal_pipelinesstruct stores compiled Metal compute shaders in anunordered_map<string, pipeline>. Functionsggml_metal_pipelines_addandggml_metal_pipelines_getmanage the cache by pipeline name. - Pipeline lookup by operation: Specialized getter functions construct pipeline names from operation parameters, check the cache, and compile on cache miss via
ggml_metal_library_compile_pipeline. These cover:- Base operations (add_id, concat)
- Copy operations between type pairs (e.g., f16-to-f32)
- Pooling (1D and 2D, average and max)
- Matrix-vector and matrix-matrix multiplication with all quantization types (Q4_0 through IQ4_XS)
- Flash attention with multiple head size and type configurations
- Get rows, RMS norm, softmax, rope, and more
The extensive pipeline lookup system supports the wide variety of quantization formats and operation types that GGML needs for efficient ML inference on Apple GPUs.
Usage
This module is used internally by the Metal backend. User code does not interact with it directly; it is called by ggml-metal-ops.cpp when dispatching operations to select the appropriate compiled Metal shader.
Code Reference
Source Location
GGML repo, file: src/ggml-metal/ggml-metal-device.cpp (1864 lines).
Signatures
ggml_metal_device_t ggml_metal_device_get(int device);
ggml_metal_pipelines_t ggml_metal_pipelines_init(void);
void ggml_metal_pipelines_free(ggml_metal_pipelines_t ppls);
void ggml_metal_pipelines_add(ggml_metal_pipelines_t ppls, const char * name, ggml_metal_pipeline_t pipeline);
ggml_metal_pipeline_t ggml_metal_pipelines_get(ggml_metal_pipelines_t ppls, const char * name);
ggml_metal_pipeline_with_params ggml_metal_library_get_pipeline_base(ggml_metal_library_t lib, ggml_op op);
ggml_metal_pipeline_with_params ggml_metal_library_get_pipeline_cpy(ggml_metal_library_t lib, ggml_type tsrc, ggml_type tdst);
ggml_metal_pipeline_with_params ggml_metal_library_get_pipeline_mul_mv(ggml_metal_library_t lib, const ggml_tensor * src0, ...);
ggml_metal_pipeline_with_params ggml_metal_library_get_pipeline_mul_mm(ggml_metal_library_t lib, const ggml_tensor * src0, ...);
Import
#include "ggml-metal-device.h"
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
device |
int |
Yes | Metal device index (typically 0 for the default GPU). |
ppls |
ggml_metal_pipelines_t |
Yes | Pipeline cache handle for storing/retrieving compiled shaders. |
lib |
ggml_metal_library_t |
Yes | Handle to the Metal shader library from which pipelines are compiled. |
op |
ggml_op |
Varies | The GGML operation type for which to retrieve a compiled pipeline. |
tsrc / tdst |
ggml_type |
Varies | Source and destination tensor types for copy and quantization-specific pipelines. |
Outputs
| Output | Type | Description |
|---|---|---|
| Device handle | ggml_metal_device_t |
Opaque pointer to the Metal device (from ggml_metal_device_get).
|
| Pipeline with params | ggml_metal_pipeline_with_params |
Struct containing the compiled pipeline and associated dispatch parameters (threadgroup sizes, etc.). |
Usage Examples
// Internal pipeline lookup during operation dispatch:
ggml_metal_library_t lib = ggml_metal_device_get_library(dev);
// Get a pipeline for matrix-vector multiplication with Q4_0 quantization
ggml_metal_pipeline_with_params pwp =
ggml_metal_library_get_pipeline_mul_mv(lib, src0, src1);
if (!pwp.pipeline) {
GGML_LOG_ERROR("Failed to find pipeline for mul_mv\n");
return;
}
// Use pwp.pipeline and pwp.params for kernel dispatch