Implementation:Ggml org Ggml Ggml backend sched new
| Attribute | Value |
|---|---|
| Page Type | Implementation |
| Full Name | Ggml_org_Ggml_Ggml_backend_sched_new |
| Short Name | Ggml_backend_sched_new |
| Repository | https://github.com/ggml-org/ggml |
| Language | C |
| Domain Tags | ML_Infrastructure, Hardware_Abstraction |
| Knowledge Source | GGML |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
Description
ggml_backend_sched_new is the constructor function for the GGML backend scheduler. It creates and initializes a ggml_backend_sched_t scheduler handle that manages the execution of computation graphs across multiple heterogeneous backends. The scheduler takes ownership of the priority ordering of backends, allocates internal data structures for graph splitting and tensor placement, and prepares the infrastructure for transparent multi-backend graph execution.
The function allocates hash tables for tensor-to-backend mapping, node and leaf backend assignment arrays, a context buffer for split management, and a graph allocator (ggml_gallocr) configured for the provided backends and buffer types. When parallel mode is enabled, it also creates backend events for overlapping data transfers with computation.
Usage
ggml_backend_sched_new is typically called once during application initialization, after all backends have been created. The returned scheduler handle is then used throughout the application lifetime to allocate, split, and compute graphs. A common usage pattern is:
- Create backends (GPU, CPU, etc.)
- Call
ggml_backend_sched_newwith the backends in priority order - Optionally reserve memory with a measurement graph via
ggml_backend_sched_reserve - For each inference or training step, build a graph and call
ggml_backend_sched_graph_compute - Free the scheduler with
ggml_backend_sched_freewhen done
Code Reference
Source Location
| Attribute | Value |
|---|---|
| File | src/ggml-backend.cpp
|
| Lines | L1631-1698 |
| Repository | https://github.com/ggml-org/ggml |
Signature
ggml_backend_sched_t ggml_backend_sched_new(
ggml_backend_t * backends,
ggml_backend_buffer_type_t * bufts,
int n_backends,
size_t graph_size,
bool parallel,
bool op_offload);
Import
#include "ggml-backend.h"
Dependencies: ggml-backend.h, ggml-alloc.h
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
| backends | ggml_backend_t * |
Priority-ordered array of backends. Backends with lower index are given higher priority for operation assignment. The last element must be a CPU backend, which serves as the universal fallback since it supports all operations. |
| bufts | ggml_backend_buffer_type_t * |
Optional array of buffer types, one per backend. When NULL, the default buffer type for each backend is used (obtained via ggml_backend_get_default_buffer_type). Each buffer type must be supported by its corresponding backend.
|
| n_backends | int |
Number of backends in the backends array. Must be at least 1 and at most GGML_SCHED_MAX_BACKENDS (default: 16).
|
| graph_size | size_t |
Maximum number of nodes in the computation graphs that will be processed. This determines the size of internal hash tables and allocation arrays. Typically set to GGML_DEFAULT_GRAPH_SIZE or a model-specific value.
|
| parallel | bool |
Enable parallel copy mode. When true, the scheduler maintains multiple copies of inter-device tensors (GGML_SCHED_MAX_COPIES) and creates backend events, enabling overlapped data transfers and computation across sequential graph evaluations.
|
| op_offload | bool |
Enable operation offloading. When true, operations are assigned to accelerator backends whenever supported. When false, only operations involving tensors already on an accelerator are offloaded.
|
Outputs
| Return | Type | Description |
|---|---|---|
| scheduler handle | ggml_backend_sched_t |
An opaque handle to the newly created backend scheduler. This handle is used with all subsequent scheduler functions (ggml_backend_sched_reserve, ggml_backend_sched_graph_compute, ggml_backend_sched_free, etc.). The caller is responsible for freeing it with ggml_backend_sched_free.
|
Usage Examples
Basic multi-backend scheduler initialization (GPU + CPU):
// Create backends
ggml_backend_t backend_gpu = ggml_backend_cuda_init(0);
ggml_backend_t backend_cpu = ggml_backend_cpu_init();
ggml_backend_t backends[] = { backend_gpu, backend_cpu };
int num_backends = 2;
// Create scheduler: GPU has priority, CPU is fallback
ggml_backend_sched_t sched = ggml_backend_sched_new(
backends,
NULL, // use default buffer types
num_backends,
GGML_DEFAULT_GRAPH_SIZE,
false, // no parallel copies
true // enable op offloading
);
// Reserve memory using a measurement graph
struct ggml_cgraph * measure_graph = build_graph(sched, max_batch_size);
ggml_backend_sched_reserve(sched, measure_graph);
// ... use sched for graph computation ...
ggml_backend_sched_free(sched);
Multi-GPU with parallel copies and custom buffer types:
ggml_backend_t backends[] = { backend_gpu0, backend_gpu1, backend_cpu };
ggml_backend_buffer_type_t bufts[] = { buft_gpu0, buft_gpu1, buft_cpu };
ggml_backend_sched_t sched = ggml_backend_sched_new(
backends,
bufts, // custom buffer types per backend
3, // three backends
GGML_DEFAULT_GRAPH_SIZE,
true, // enable parallel copies for pipelining
true // enable op offloading
);