Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Ggml Ggml backend sched new

From Leeroopedia


Attribute Value
Page Type Implementation
Full Name Ggml_org_Ggml_Ggml_backend_sched_new
Short Name Ggml_backend_sched_new
Repository https://github.com/ggml-org/ggml
Language C
Domain Tags ML_Infrastructure, Hardware_Abstraction
Knowledge Source GGML
Last Updated 2025-05-15 12:00 GMT

Overview

Description

ggml_backend_sched_new is the constructor function for the GGML backend scheduler. It creates and initializes a ggml_backend_sched_t scheduler handle that manages the execution of computation graphs across multiple heterogeneous backends. The scheduler takes ownership of the priority ordering of backends, allocates internal data structures for graph splitting and tensor placement, and prepares the infrastructure for transparent multi-backend graph execution.

The function allocates hash tables for tensor-to-backend mapping, node and leaf backend assignment arrays, a context buffer for split management, and a graph allocator (ggml_gallocr) configured for the provided backends and buffer types. When parallel mode is enabled, it also creates backend events for overlapping data transfers with computation.

Usage

ggml_backend_sched_new is typically called once during application initialization, after all backends have been created. The returned scheduler handle is then used throughout the application lifetime to allocate, split, and compute graphs. A common usage pattern is:

  1. Create backends (GPU, CPU, etc.)
  2. Call ggml_backend_sched_new with the backends in priority order
  3. Optionally reserve memory with a measurement graph via ggml_backend_sched_reserve
  4. For each inference or training step, build a graph and call ggml_backend_sched_graph_compute
  5. Free the scheduler with ggml_backend_sched_free when done

Code Reference

Source Location

Attribute Value
File src/ggml-backend.cpp
Lines L1631-1698
Repository https://github.com/ggml-org/ggml

Signature

ggml_backend_sched_t ggml_backend_sched_new(
    ggml_backend_t             * backends,
    ggml_backend_buffer_type_t * bufts,
    int                          n_backends,
    size_t                       graph_size,
    bool                         parallel,
    bool                         op_offload);

Import

#include "ggml-backend.h"

Dependencies: ggml-backend.h, ggml-alloc.h

I/O Contract

Inputs

Parameter Type Description
backends ggml_backend_t * Priority-ordered array of backends. Backends with lower index are given higher priority for operation assignment. The last element must be a CPU backend, which serves as the universal fallback since it supports all operations.
bufts ggml_backend_buffer_type_t * Optional array of buffer types, one per backend. When NULL, the default buffer type for each backend is used (obtained via ggml_backend_get_default_buffer_type). Each buffer type must be supported by its corresponding backend.
n_backends int Number of backends in the backends array. Must be at least 1 and at most GGML_SCHED_MAX_BACKENDS (default: 16).
graph_size size_t Maximum number of nodes in the computation graphs that will be processed. This determines the size of internal hash tables and allocation arrays. Typically set to GGML_DEFAULT_GRAPH_SIZE or a model-specific value.
parallel bool Enable parallel copy mode. When true, the scheduler maintains multiple copies of inter-device tensors (GGML_SCHED_MAX_COPIES) and creates backend events, enabling overlapped data transfers and computation across sequential graph evaluations.
op_offload bool Enable operation offloading. When true, operations are assigned to accelerator backends whenever supported. When false, only operations involving tensors already on an accelerator are offloaded.

Outputs

Return Type Description
scheduler handle ggml_backend_sched_t An opaque handle to the newly created backend scheduler. This handle is used with all subsequent scheduler functions (ggml_backend_sched_reserve, ggml_backend_sched_graph_compute, ggml_backend_sched_free, etc.). The caller is responsible for freeing it with ggml_backend_sched_free.

Usage Examples

Basic multi-backend scheduler initialization (GPU + CPU):

// Create backends
ggml_backend_t backend_gpu = ggml_backend_cuda_init(0);
ggml_backend_t backend_cpu = ggml_backend_cpu_init();

ggml_backend_t backends[] = { backend_gpu, backend_cpu };
int num_backends = 2;

// Create scheduler: GPU has priority, CPU is fallback
ggml_backend_sched_t sched = ggml_backend_sched_new(
    backends,
    NULL,                      // use default buffer types
    num_backends,
    GGML_DEFAULT_GRAPH_SIZE,
    false,                     // no parallel copies
    true                       // enable op offloading
);

// Reserve memory using a measurement graph
struct ggml_cgraph * measure_graph = build_graph(sched, max_batch_size);
ggml_backend_sched_reserve(sched, measure_graph);

// ... use sched for graph computation ...

ggml_backend_sched_free(sched);

Multi-GPU with parallel copies and custom buffer types:

ggml_backend_t backends[] = { backend_gpu0, backend_gpu1, backend_cpu };
ggml_backend_buffer_type_t bufts[] = { buft_gpu0, buft_gpu1, buft_cpu };

ggml_backend_sched_t sched = ggml_backend_sched_new(
    backends,
    bufts,                     // custom buffer types per backend
    3,                         // three backends
    GGML_DEFAULT_GRAPH_SIZE,
    true,                      // enable parallel copies for pipelining
    true                       // enable op offloading
);

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment