Implementation:Ggml org Ggml Metal common

Metadata

Field	Value
Page Type	Implementation (API Doc)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, GPU_Computing
Last Updated	2025-05-15 12:00 GMT

Overview

Implements memory range tracking for concurrent operation scheduling and graph optimization on the Metal backend.

Description

ggml-metal-common.cpp provides infrastructure for determining which Metal compute operations can safely execute concurrently without memory conflicts. It defines ggml_mem_range structs representing memory intervals characterized by a buffer identifier, start address, end address, and a source/destination type flag.

The core mechanism works as follows:

Memory range representation: Each ggml_mem_range captures an interval within a buffer. The ggml_mem_range_type distinguishes between source ranges (operations read from them) and destination ranges (operations write to them).
Range resolution from tensors: The ggml_mem_range_from_tensor function resolves view sources and computes actual allocated memory ranges from tensor buffer metadata, using ggml_backend_buft_get_alloc_size to account for extra padding allocated by buffer types.
Conflict detection: The ggml_mem_ranges_check function tests whether a new memory range overlaps with any existing range. Overlapping source-source ranges are permitted (multiple reads are safe), but any overlap involving a destination range indicates a conflict.
Graph optimization: The file also includes ggml_graph_optimize, which reorders computation graph nodes to improve concurrency while respecting operation fusion constraints.

This module is designed to be backend-generic and could potentially be reused by other GPU backends.

Usage

This module is used internally by the Metal backend's graph dispatch code (ggml-metal-ops.cpp) to determine which operations can be encoded concurrently within a single command buffer. It is not called directly by user code.

Code Reference

Source Location

GGML repo, file: src/ggml-metal/ggml-metal-common.cpp (446 lines).

Signatures

ggml_mem_ranges_t ggml_mem_ranges_init(int debug);
void ggml_mem_ranges_free(ggml_mem_ranges_t mrs);
void ggml_mem_ranges_reset(ggml_mem_ranges_t mrs);
bool ggml_mem_ranges_add(ggml_mem_ranges_t mrs, const ggml_tensor * tensor);
bool ggml_mem_ranges_check(ggml_mem_ranges_t mrs, const ggml_tensor * tensor);
void ggml_graph_optimize(ggml_cgraph * gf, bool use_fusion);

Import

#include "ggml-metal-common.h"

I/O Contract

Inputs

Parameter	Type	Required	Description
`mrs`	`ggml_mem_ranges_t`	Yes	Handle to the memory range tracker instance.
`tensor`	`const ggml_tensor *`	Yes	Tensor whose source and destination memory ranges are to be added or checked for conflicts.
`debug`	`int`	Yes	Debug verbosity level; values above 2 enable detailed logging of range additions and conflict detections.

Outputs

Output	Type	Description
Range tracker handle	`ggml_mem_ranges_t`	Opaque pointer to the initialized memory range tracker (from `ggml_mem_ranges_init`).
Conflict status	`bool`	`true` if the tensor's memory ranges do not conflict with existing ranges (from `ggml_mem_ranges_check`); `false` if an overlap is detected.

Usage Examples

// Internal usage within Metal ops dispatch:
ggml_mem_ranges_t mrs = ggml_mem_ranges_init(/* debug */ 0);

// For each operation in the current concurrent group:
for (int i = 0; i < n_ops; i++) {
    ggml_tensor * node = graph->nodes[i];

    // Check if this operation can safely run concurrently
    if (!ggml_mem_ranges_check(mrs, node)) {
        // Memory conflict detected -- start a new concurrent group
        ggml_mem_ranges_reset(mrs);
    }

    // Register this operation's memory ranges
    ggml_mem_ranges_add(mrs, node);
}

ggml_mem_ranges_free(mrs);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment