Implementation:Ggml org Ggml Ggml gallocr alloc graph

Summary

ggml_gallocr_alloc_graph allocates backend buffer memory for all tensors in a computation graph, applying memory reuse optimization so that intermediate tensors with non-overlapping lifetimes share the same memory regions.

API

bool ggml_gallocr_alloc_graph(ggml_gallocr_t galloc, struct ggml_cgraph * graph);

Parameters

Parameter	Type	Description
`galloc`	`ggml_gallocr_t`	Graph allocator handle (single-buffer or multi-buffer).
`graph`	`struct ggml_cgraph *`	The computation graph whose tensors need memory allocation.

Return Value

Returns bool — true on success, false on failure.

Source

Repository: https://github.com/ggml-org/ggml
File: src/ggml-alloc.c — lines 1055–1101

Behavior

Accepts a graph allocator and a computation graph.
If the allocator is configured for a single buffer and has not yet been reserved, it auto-reserves by calling the internal reservation routine to compute the memory plan and allocate the backing buffer.
Walks the computation graph and assigns each tensor to its pre-planned offset within the allocated buffer(s).
Tensors whose lifetimes do not overlap are assigned to overlapping memory regions, enabling memory reuse and reducing peak memory consumption.

Dependencies

Header: ggml-alloc.h
Import: #include "ggml-alloc.h"

Related Functions

ggml_backend_sched_alloc_graph

For multi-backend scenarios (e.g., splitting work across CPU and GPU), a higher-level scheduler entry point is provided:

bool ggml_backend_sched_alloc_graph(ggml_backend_sched_t sched, struct ggml_cgraph * graph);

File: src/ggml-backend.cpp — lines 1768–1785
This function coordinates allocation across multiple backends, delegating per-backend allocation to ggml_gallocr_alloc_graph internally.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment