Implementation:Ggml org Ggml Sycl backend api

Metadata

Field	Value
Page Type	Implementation (API Doc)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, GPU_Computing
Last Updated	2025-05-15 12:00 GMT

Overview

Public C header declaring the Intel SYCL backend interface for running tensor operations on Intel GPUs and other SYCL-compatible accelerators.

Description

ggml-sycl.h declares the SYCL backend's public API (49 lines). Licensed under MIT by Intel Corporation. It provides:

Constants: GGML_SYCL_NAME = "SYCL" and GGML_SYCL_MAX_DEVICES = 48 (the highest device limit among GGML backends).
Backend initialization: ggml_backend_sycl_init(device) creates a backend for a specific SYCL device by index.
Type checking: ggml_backend_is_sycl(backend) verifies if a backend is SYCL-based.
Buffer types:
- ggml_backend_sycl_buffer_type(device) -- device-specific buffer type
- ggml_backend_sycl_split_buffer_type(tensor_split) -- split buffer type that distributes matrices by rows across multiple devices for tensor parallelism
- ggml_backend_sycl_host_buffer_type() -- pinned host buffer for faster CPU-GPU transfers
Device enumeration and queries:
- ggml_backend_sycl_print_sycl_devices -- prints all available SYCL devices
- ggml_backend_sycl_get_gpu_list -- retrieves GPU device IDs
- ggml_backend_sycl_get_device_description -- gets human-readable device description
- ggml_backend_sycl_get_device_count -- returns number of available SYCL devices
- ggml_backend_sycl_get_device_memory -- queries free and total device memory
Registration: ggml_backend_sycl_reg() returns the registration handle.

Note: SYCL does not support host memory registration (commented-out declarations preserved for reference).

Usage

Include this header to use the SYCL backend for inference on Intel GPUs via the oneAPI programming model. The backend supports multi-device tensor parallelism via split buffer types.

Code Reference

Source Location

GGML repo, file: include/ggml-sycl.h (49 lines).

Signatures

#define GGML_SYCL_NAME "SYCL"
#define GGML_SYCL_MAX_DEVICES 48

GGML_BACKEND_API ggml_backend_t ggml_backend_sycl_init(int device);
GGML_BACKEND_API bool ggml_backend_is_sycl(ggml_backend_t backend);
GGML_BACKEND_API ggml_backend_buffer_type_t ggml_backend_sycl_buffer_type(int device);
GGML_BACKEND_API ggml_backend_buffer_type_t ggml_backend_sycl_split_buffer_type(const float * tensor_split);
GGML_BACKEND_API ggml_backend_buffer_type_t ggml_backend_sycl_host_buffer_type(void);
GGML_BACKEND_API void ggml_backend_sycl_print_sycl_devices(void);
GGML_BACKEND_API void ggml_backend_sycl_get_gpu_list(int * id_list, int max_len);
GGML_BACKEND_API void ggml_backend_sycl_get_device_description(int device, char * description, size_t description_size);
GGML_BACKEND_API int  ggml_backend_sycl_get_device_count(void);
GGML_BACKEND_API void ggml_backend_sycl_get_device_memory(int device, size_t * free, size_t * total);
GGML_BACKEND_API ggml_backend_reg_t ggml_backend_sycl_reg(void);

Import

#include "ggml-sycl.h"

I/O Contract

Inputs

Parameter	Type	Required	Description
`device`	`int`	Yes	SYCL device index (0-based) for backend initialization, buffer type, and device queries.
`backend`	`ggml_backend_t`	Yes	Backend handle for type checking.
`tensor_split`	`const float *`	Yes	Array of split ratios across devices (for `split_buffer_type`).
`id_list`	`int *`	Yes	Output array for GPU device IDs (for `get_gpu_list`).
`max_len`	`int`	Yes	Maximum number of entries to write to `id_list`.

Outputs

Output	Type	Description
Backend handle	`ggml_backend_t`	Initialized SYCL backend for the specified device.
Type check	`bool`	`true` if the backend is SYCL-based.
Buffer type	`ggml_backend_buffer_type_t`	Device, split, or host buffer type handle.
Device count	`int`	Number of available SYCL devices.
Device memory	via output params	Free and total memory on the specified device.

Usage Examples

#include "ggml-sycl.h"

// List available SYCL devices
ggml_backend_sycl_print_sycl_devices();

int n_devices = ggml_backend_sycl_get_device_count();

// Initialize backend on device 0
ggml_backend_t backend = ggml_backend_sycl_init(0);

// Query device memory
size_t free_mem, total_mem;
ggml_backend_sycl_get_device_memory(0, &free_mem, &total_mem);

// For multi-GPU: create split buffer type
float tensor_split[2] = { 0.5f, 0.5f };  // 50/50 split across 2 devices
ggml_backend_buffer_type_t split_buft = ggml_backend_sycl_split_buffer_type(tensor_split);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment