Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Metal device

From Leeroopedia


Metadata

Field Value
Page Type Implementation (API Doc)
Knowledge Sources GGML
Domains ML_Infrastructure, Tensor_Computing, GPU_Computing
Last Updated 2025-05-15 12:00 GMT

Overview

Implements Metal device management, pipeline caching, and kernel pipeline lookup/compilation for all supported GGML operations on Apple GPUs.

Description

ggml-metal-device.cpp is the central component managing the Metal compute pipeline lifecycle. It provides:

  1. Device management: A static vector of unique_ptr devices with RAII cleanup via ggml_metal_device_deleter. The ggml_metal_device_get function lazily initializes and caches Metal device handles.
  2. Pipeline caching: The ggml_metal_pipelines struct stores compiled Metal compute shaders in an unordered_map<string, pipeline>. Functions ggml_metal_pipelines_add and ggml_metal_pipelines_get manage the cache by pipeline name.
  3. Pipeline lookup by operation: Specialized getter functions construct pipeline names from operation parameters, check the cache, and compile on cache miss via ggml_metal_library_compile_pipeline. These cover:
    • Base operations (add_id, concat)
    • Copy operations between type pairs (e.g., f16-to-f32)
    • Pooling (1D and 2D, average and max)
    • Matrix-vector and matrix-matrix multiplication with all quantization types (Q4_0 through IQ4_XS)
    • Flash attention with multiple head size and type configurations
    • Get rows, RMS norm, softmax, rope, and more

The extensive pipeline lookup system supports the wide variety of quantization formats and operation types that GGML needs for efficient ML inference on Apple GPUs.

Usage

This module is used internally by the Metal backend. User code does not interact with it directly; it is called by ggml-metal-ops.cpp when dispatching operations to select the appropriate compiled Metal shader.

Code Reference

Source Location

GGML repo, file: src/ggml-metal/ggml-metal-device.cpp (1864 lines).

Signatures

ggml_metal_device_t ggml_metal_device_get(int device);
ggml_metal_pipelines_t ggml_metal_pipelines_init(void);
void ggml_metal_pipelines_free(ggml_metal_pipelines_t ppls);
void ggml_metal_pipelines_add(ggml_metal_pipelines_t ppls, const char * name, ggml_metal_pipeline_t pipeline);
ggml_metal_pipeline_t ggml_metal_pipelines_get(ggml_metal_pipelines_t ppls, const char * name);
ggml_metal_pipeline_with_params ggml_metal_library_get_pipeline_base(ggml_metal_library_t lib, ggml_op op);
ggml_metal_pipeline_with_params ggml_metal_library_get_pipeline_cpy(ggml_metal_library_t lib, ggml_type tsrc, ggml_type tdst);
ggml_metal_pipeline_with_params ggml_metal_library_get_pipeline_mul_mv(ggml_metal_library_t lib, const ggml_tensor * src0, ...);
ggml_metal_pipeline_with_params ggml_metal_library_get_pipeline_mul_mm(ggml_metal_library_t lib, const ggml_tensor * src0, ...);

Import

#include "ggml-metal-device.h"

I/O Contract

Inputs

Parameter Type Required Description
device int Yes Metal device index (typically 0 for the default GPU).
ppls ggml_metal_pipelines_t Yes Pipeline cache handle for storing/retrieving compiled shaders.
lib ggml_metal_library_t Yes Handle to the Metal shader library from which pipelines are compiled.
op ggml_op Varies The GGML operation type for which to retrieve a compiled pipeline.
tsrc / tdst ggml_type Varies Source and destination tensor types for copy and quantization-specific pipelines.

Outputs

Output Type Description
Device handle ggml_metal_device_t Opaque pointer to the Metal device (from ggml_metal_device_get).
Pipeline with params ggml_metal_pipeline_with_params Struct containing the compiled pipeline and associated dispatch parameters (threadgroup sizes, etc.).

Usage Examples

// Internal pipeline lookup during operation dispatch:
ggml_metal_library_t lib = ggml_metal_device_get_library(dev);

// Get a pipeline for matrix-vector multiplication with Q4_0 quantization
ggml_metal_pipeline_with_params pwp =
    ggml_metal_library_get_pipeline_mul_mv(lib, src0, src1);

if (!pwp.pipeline) {
    GGML_LOG_ERROR("Failed to find pipeline for mul_mv\n");
    return;
}

// Use pwp.pipeline and pwp.params for kernel dispatch

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment