Implementation:Ggml org Ggml Metal common
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (API Doc) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, GPU_Computing |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
Implements memory range tracking for concurrent operation scheduling and graph optimization on the Metal backend.
Description
ggml-metal-common.cpp provides infrastructure for determining which Metal compute operations can safely execute concurrently without memory conflicts. It defines ggml_mem_range structs representing memory intervals characterized by a buffer identifier, start address, end address, and a source/destination type flag.
The core mechanism works as follows:
- Memory range representation: Each
ggml_mem_rangecaptures an interval within a buffer. Theggml_mem_range_typedistinguishes between source ranges (operations read from them) and destination ranges (operations write to them). - Range resolution from tensors: The
ggml_mem_range_from_tensorfunction resolves view sources and computes actual allocated memory ranges from tensor buffer metadata, usingggml_backend_buft_get_alloc_sizeto account for extra padding allocated by buffer types. - Conflict detection: The
ggml_mem_ranges_checkfunction tests whether a new memory range overlaps with any existing range. Overlapping source-source ranges are permitted (multiple reads are safe), but any overlap involving a destination range indicates a conflict. - Graph optimization: The file also includes
ggml_graph_optimize, which reorders computation graph nodes to improve concurrency while respecting operation fusion constraints.
This module is designed to be backend-generic and could potentially be reused by other GPU backends.
Usage
This module is used internally by the Metal backend's graph dispatch code (ggml-metal-ops.cpp) to determine which operations can be encoded concurrently within a single command buffer. It is not called directly by user code.
Code Reference
Source Location
GGML repo, file: src/ggml-metal/ggml-metal-common.cpp (446 lines).
Signatures
ggml_mem_ranges_t ggml_mem_ranges_init(int debug);
void ggml_mem_ranges_free(ggml_mem_ranges_t mrs);
void ggml_mem_ranges_reset(ggml_mem_ranges_t mrs);
bool ggml_mem_ranges_add(ggml_mem_ranges_t mrs, const ggml_tensor * tensor);
bool ggml_mem_ranges_check(ggml_mem_ranges_t mrs, const ggml_tensor * tensor);
void ggml_graph_optimize(ggml_cgraph * gf, bool use_fusion);
Import
#include "ggml-metal-common.h"
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
mrs |
ggml_mem_ranges_t |
Yes | Handle to the memory range tracker instance. |
tensor |
const ggml_tensor * |
Yes | Tensor whose source and destination memory ranges are to be added or checked for conflicts. |
debug |
int |
Yes | Debug verbosity level; values above 2 enable detailed logging of range additions and conflict detections. |
Outputs
| Output | Type | Description |
|---|---|---|
| Range tracker handle | ggml_mem_ranges_t |
Opaque pointer to the initialized memory range tracker (from ggml_mem_ranges_init).
|
| Conflict status | bool |
true if the tensor's memory ranges do not conflict with existing ranges (from ggml_mem_ranges_check); false if an overlap is detected.
|
Usage Examples
// Internal usage within Metal ops dispatch:
ggml_mem_ranges_t mrs = ggml_mem_ranges_init(/* debug */ 0);
// For each operation in the current concurrent group:
for (int i = 0; i < n_ops; i++) {
ggml_tensor * node = graph->nodes[i];
// Check if this operation can safely run concurrently
if (!ggml_mem_ranges_check(mrs, node)) {
// Memory conflict detected -- start a new concurrent group
ggml_mem_ranges_reset(mrs);
}
// Register this operation's memory ranges
ggml_mem_ranges_add(mrs, node);
}
ggml_mem_ranges_free(mrs);