Implementation:Ggml org Ggml Metal device

Metadata

Field	Value
Page Type	Implementation (API Doc)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, GPU_Computing
Last Updated	2025-05-15 12:00 GMT

Overview

Implements Metal device management, pipeline caching, and kernel pipeline lookup/compilation for all supported GGML operations on Apple GPUs.

Description

ggml-metal-device.cpp is the central component managing the Metal compute pipeline lifecycle. It provides:

Device management: A static vector of unique_ptr devices with RAII cleanup via ggml_metal_device_deleter. The ggml_metal_device_get function lazily initializes and caches Metal device handles.
Pipeline caching: The ggml_metal_pipelines struct stores compiled Metal compute shaders in an unordered_map<string, pipeline>. Functions ggml_metal_pipelines_add and ggml_metal_pipelines_get manage the cache by pipeline name.
Pipeline lookup by operation: Specialized getter functions construct pipeline names from operation parameters, check the cache, and compile on cache miss via ggml_metal_library_compile_pipeline. These cover:
- Base operations (add_id, concat)
- Copy operations between type pairs (e.g., f16-to-f32)
- Pooling (1D and 2D, average and max)
- Matrix-vector and matrix-matrix multiplication with all quantization types (Q4_0 through IQ4_XS)
- Flash attention with multiple head size and type configurations
- Get rows, RMS norm, softmax, rope, and more

The extensive pipeline lookup system supports the wide variety of quantization formats and operation types that GGML needs for efficient ML inference on Apple GPUs.

Usage

This module is used internally by the Metal backend. User code does not interact with it directly; it is called by ggml-metal-ops.cpp when dispatching operations to select the appropriate compiled Metal shader.

Code Reference

Source Location

GGML repo, file: src/ggml-metal/ggml-metal-device.cpp (1864 lines).

Signatures

ggml_metal_device_t ggml_metal_device_get(int device);
ggml_metal_pipelines_t ggml_metal_pipelines_init(void);
void ggml_metal_pipelines_free(ggml_metal_pipelines_t ppls);
void ggml_metal_pipelines_add(ggml_metal_pipelines_t ppls, const char * name, ggml_metal_pipeline_t pipeline);
ggml_metal_pipeline_t ggml_metal_pipelines_get(ggml_metal_pipelines_t ppls, const char * name);
ggml_metal_pipeline_with_params ggml_metal_library_get_pipeline_base(ggml_metal_library_t lib, ggml_op op);
ggml_metal_pipeline_with_params ggml_metal_library_get_pipeline_cpy(ggml_metal_library_t lib, ggml_type tsrc, ggml_type tdst);
ggml_metal_pipeline_with_params ggml_metal_library_get_pipeline_mul_mv(ggml_metal_library_t lib, const ggml_tensor * src0, ...);
ggml_metal_pipeline_with_params ggml_metal_library_get_pipeline_mul_mm(ggml_metal_library_t lib, const ggml_tensor * src0, ...);

Import

#include "ggml-metal-device.h"

I/O Contract

Inputs

Parameter	Type	Required	Description
`device`	`int`	Yes	Metal device index (typically 0 for the default GPU).
`ppls`	`ggml_metal_pipelines_t`	Yes	Pipeline cache handle for storing/retrieving compiled shaders.
`lib`	`ggml_metal_library_t`	Yes	Handle to the Metal shader library from which pipelines are compiled.
`op`	`ggml_op`	Varies	The GGML operation type for which to retrieve a compiled pipeline.
`tsrc / tdst`	`ggml_type`	Varies	Source and destination tensor types for copy and quantization-specific pipelines.

Outputs

Output	Type	Description
Device handle	`ggml_metal_device_t`	Opaque pointer to the Metal device (from `ggml_metal_device_get`).
Pipeline with params	`ggml_metal_pipeline_with_params`	Struct containing the compiled pipeline and associated dispatch parameters (threadgroup sizes, etc.).

Usage Examples

// Internal pipeline lookup during operation dispatch:
ggml_metal_library_t lib = ggml_metal_device_get_library(dev);

// Get a pipeline for matrix-vector multiplication with Q4_0 quantization
ggml_metal_pipeline_with_params pwp =
    ggml_metal_library_get_pipeline_mul_mv(lib, src0, src1);

if (!pwp.pipeline) {
    GGML_LOG_ERROR("Failed to find pipeline for mul_mv\n");
    return;
}

// Use pwp.pipeline and pwp.params for kernel dispatch

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment