Implementation:Ggml org Ggml Ggml backend sched new

Attribute	Value
Page Type	Implementation
Full Name	Ggml_org_Ggml_Ggml_backend_sched_new
Short Name	Ggml_backend_sched_new
Repository	https://github.com/ggml-org/ggml
Language	C
Domain Tags	ML_Infrastructure, Hardware_Abstraction
Knowledge Source	GGML
Last Updated	2025-05-15 12:00 GMT

Overview

Description

ggml_backend_sched_new is the constructor function for the GGML backend scheduler. It creates and initializes a ggml_backend_sched_t scheduler handle that manages the execution of computation graphs across multiple heterogeneous backends. The scheduler takes ownership of the priority ordering of backends, allocates internal data structures for graph splitting and tensor placement, and prepares the infrastructure for transparent multi-backend graph execution.

The function allocates hash tables for tensor-to-backend mapping, node and leaf backend assignment arrays, a context buffer for split management, and a graph allocator (ggml_gallocr) configured for the provided backends and buffer types. When parallel mode is enabled, it also creates backend events for overlapping data transfers with computation.

Usage

ggml_backend_sched_new is typically called once during application initialization, after all backends have been created. The returned scheduler handle is then used throughout the application lifetime to allocate, split, and compute graphs. A common usage pattern is:

Create backends (GPU, CPU, etc.)
Call ggml_backend_sched_new with the backends in priority order
Optionally reserve memory with a measurement graph via ggml_backend_sched_reserve
For each inference or training step, build a graph and call ggml_backend_sched_graph_compute
Free the scheduler with ggml_backend_sched_free when done

Code Reference

Source Location

Attribute	Value
File	`src/ggml-backend.cpp`
Lines	L1631-1698
Repository	https://github.com/ggml-org/ggml

Signature

ggml_backend_sched_t ggml_backend_sched_new(
    ggml_backend_t             * backends,
    ggml_backend_buffer_type_t * bufts,
    int                          n_backends,
    size_t                       graph_size,
    bool                         parallel,
    bool                         op_offload);

Import

#include "ggml-backend.h"

Dependencies: ggml-backend.h, ggml-alloc.h

I/O Contract

Inputs

Parameter	Type	Description
backends	`ggml_backend_t *`	Priority-ordered array of backends. Backends with lower index are given higher priority for operation assignment. The last element must be a CPU backend, which serves as the universal fallback since it supports all operations.
bufts	`ggml_backend_buffer_type_t *`	Optional array of buffer types, one per backend. When `NULL`, the default buffer type for each backend is used (obtained via `ggml_backend_get_default_buffer_type`). Each buffer type must be supported by its corresponding backend.
n_backends	`int`	Number of backends in the `backends` array. Must be at least 1 and at most `GGML_SCHED_MAX_BACKENDS` (default: 16).
graph_size	`size_t`	Maximum number of nodes in the computation graphs that will be processed. This determines the size of internal hash tables and allocation arrays. Typically set to `GGML_DEFAULT_GRAPH_SIZE` or a model-specific value.
parallel	`bool`	Enable parallel copy mode. When `true`, the scheduler maintains multiple copies of inter-device tensors (`GGML_SCHED_MAX_COPIES`) and creates backend events, enabling overlapped data transfers and computation across sequential graph evaluations.
op_offload	`bool`	Enable operation offloading. When `true`, operations are assigned to accelerator backends whenever supported. When `false`, only operations involving tensors already on an accelerator are offloaded.

Outputs

Return	Type	Description
scheduler handle	`ggml_backend_sched_t`	An opaque handle to the newly created backend scheduler. This handle is used with all subsequent scheduler functions (`ggml_backend_sched_reserve`, `ggml_backend_sched_graph_compute`, `ggml_backend_sched_free`, etc.). The caller is responsible for freeing it with `ggml_backend_sched_free`.

Usage Examples

Basic multi-backend scheduler initialization (GPU + CPU):

// Create backends
ggml_backend_t backend_gpu = ggml_backend_cuda_init(0);
ggml_backend_t backend_cpu = ggml_backend_cpu_init();

ggml_backend_t backends[] = { backend_gpu, backend_cpu };
int num_backends = 2;

// Create scheduler: GPU has priority, CPU is fallback
ggml_backend_sched_t sched = ggml_backend_sched_new(
    backends,
    NULL,                      // use default buffer types
    num_backends,
    GGML_DEFAULT_GRAPH_SIZE,
    false,                     // no parallel copies
    true                       // enable op offloading
);

// Reserve memory using a measurement graph
struct ggml_cgraph * measure_graph = build_graph(sched, max_batch_size);
ggml_backend_sched_reserve(sched, measure_graph);

// ... use sched for graph computation ...

ggml_backend_sched_free(sched);

Multi-GPU with parallel copies and custom buffer types:

ggml_backend_t backends[] = { backend_gpu0, backend_gpu1, backend_cpu };
ggml_backend_buffer_type_t bufts[] = { buft_gpu0, buft_gpu1, buft_cpu };

ggml_backend_sched_t sched = ggml_backend_sched_new(
    backends,
    bufts,                     // custom buffer types per backend
    3,                         // three backends
    GGML_DEFAULT_GRAPH_SIZE,
    true,                      // enable parallel copies for pipelining
    true                       // enable op offloading
);

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment