Implementation:Ggml org Ggml Ggml backend sched graph compute
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (API Doc) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Hardware_Abstraction |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
Concrete function for executing a computation graph across one or more hardware backends via the GGML backend scheduler. This is the primary synchronous entry point for graph execution in the GGML library.
Description
ggml_backend_sched_graph_compute is the synchronous graph execution function in GGML's backend scheduler. It takes a pre-built computation graph and executes all of its operations across the assigned hardware backends, blocking until all computation is complete.
Internally, the function performs the following steps:
- Delegates to the async variant: Calls
ggml_backend_sched_graph_compute_async(sched, graph), which handles allocation and dispatch. - Auto-allocation: If the graph has not been previously allocated (i.e.,
sched->is_allocis false), the async function automatically callsggml_backend_sched_alloc_graph()to allocate memory for all tensors in the graph. If allocation fails,GGML_STATUS_ALLOC_FAILEDis returned immediately. - Graph splitting and dispatch: The async function calls
ggml_backend_sched_compute_splits(), which iterates over the pre-computed graph splits. Each split is a contiguous subgraph assigned to a single backend. For each split, the corresponding backend'sgraph_computemethod is invoked. - Synchronization: After the async function returns,
ggml_backend_sched_synchronize(sched)is called, which iterates over all backends in the scheduler and callsggml_backend_synchronize()on each. This ensures all asynchronous work has completed before the function returns.
The function returns a ggml_status enum indicating success or the nature of any failure.
After execution completes, output tensors remain in their backend-specific buffers. To retrieve results to host memory, the caller uses ggml_backend_tensor_get().
Code Reference
Source Location
GGML repo, file: src/ggml-backend.cpp, lines L1787-1791.
Signature
enum ggml_status ggml_backend_sched_graph_compute(
ggml_backend_sched_t sched,
struct ggml_cgraph * graph
);
Import
#include "ggml-backend.h"
Language
C
Dependencies
ggml-backend.h-- Public API header declaring the function.ggml-backend-impl.h-- Internal header defining backend interface structures used by the scheduler.
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
sched |
ggml_backend_sched_t |
Yes | Handle to the backend scheduler. Must have been created with ggml_backend_sched_new() and configured with one or more backends.
|
graph |
struct ggml_cgraph * |
Yes | Pointer to the computation graph to execute. The graph must have been built using GGML tensor operations in a valid ggml_context.
|
Outputs
| Output | Type | Description |
|---|---|---|
| return value | enum ggml_status |
Status code indicating the result of the computation. GGML_STATUS_SUCCESS on success; GGML_STATUS_ALLOC_FAILED if automatic graph allocation failed; other status codes for backend-specific errors.
|
Usage Examples
Basic Graph Execution
#include "ggml.h"
#include "ggml-backend.h"
// Assume backends and scheduler have been initialized:
// ggml_backend_t backend_gpu = ...;
// ggml_backend_t backend_cpu = ...;
// ggml_backend_sched_t sched = ggml_backend_sched_new(...);
// Build the computation graph
struct ggml_context * ctx = ggml_init(params);
struct ggml_tensor * a = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 128, 64);
struct ggml_tensor * b = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, 128, 64);
struct ggml_tensor * result = ggml_mul_mat(ctx, a, b);
struct ggml_cgraph * graph = ggml_new_graph(ctx);
ggml_build_forward_expand(graph, result);
// Execute the graph (allocation happens automatically on first call)
enum ggml_status status = ggml_backend_sched_graph_compute(sched, graph);
if (status != GGML_STATUS_SUCCESS) {
// handle error
}
// Retrieve the result from the backend buffer to host memory
float * output_data = malloc(ggml_nbytes(result));
ggml_backend_tensor_get(result, output_data, 0, ggml_nbytes(result));
Repeated Execution (Inference Loop)
#include "ggml.h"
#include "ggml-backend.h"
// Build the graph once
struct ggml_cgraph * graph = build_graph(sched);
// Execute the same graph structure repeatedly.
// On the first iteration the graph is allocated automatically.
// Subsequent iterations reuse the allocation.
for (int i = 0; i < num_iterations; ++i) {
enum ggml_status status = ggml_backend_sched_graph_compute(sched, graph);
if (status != GGML_STATUS_SUCCESS) {
break;
}
}
Execution with Explicit Allocation and Input Setting
#include "ggml.h"
#include "ggml-backend.h"
// Build a new graph
struct ggml_cgraph * graph = build_graph(sched);
// Reset the scheduler to clear any previous allocation
ggml_backend_sched_reset(sched);
// Explicitly allocate the graph (without executing)
ggml_backend_sched_alloc_graph(sched, graph);
// Set input data on the allocated tensors
ggml_backend_tensor_set(input_tensor, input_data, 0, ggml_nbytes(input_tensor));
// Execute the pre-allocated graph
enum ggml_status status = ggml_backend_sched_graph_compute(sched, graph);
// Retrieve output
float * output = malloc(ggml_nbytes(output_tensor));
ggml_backend_tensor_get(output_tensor, output, 0, ggml_nbytes(output_tensor));
Retrieving Results with ggml_backend_tensor_get
After graph execution, output tensors reside in backend-specific memory (e.g., GPU VRAM). The ggml_backend_tensor_get function copies tensor data from the backend buffer to a caller-provided host memory buffer.
// Signature (src/ggml-backend.cpp:L297-310):
void ggml_backend_tensor_get(
const struct ggml_tensor * tensor,
void * data,
size_t offset,
size_t size
);
// Example: retrieve the full contents of an output tensor
float * host_buffer = malloc(ggml_nbytes(result));
ggml_backend_tensor_get(result, host_buffer, 0, ggml_nbytes(result));
// Example: retrieve a partial slice (e.g., first 256 bytes)
float * partial = malloc(256);
ggml_backend_tensor_get(result, partial, 0, 256);
The function asserts that the tensor has a valid buffer, that the tensor data pointer is non-null, and that offset + size does not exceed the tensor's byte size. Internally, it delegates to the buffer interface's get_tensor method, which handles the device-to-host transfer for the specific backend (e.g., cudaMemcpy for CUDA, direct memory copy for CPU).