Implementation:Ggml org Ggml Sycl backend

Knowledge Sources	GGML
Domains	ML_Infrastructure, GPU_Compute, Hardware_Abstraction
Last Updated	2025-05-15 12:00 GMT

Overview

Main SYCL backend implementation -- the central orchestration file that registers the SYCL backend with GGML and dispatches all tensor operations to specialized SYCL kernel modules.

Description

ggml-sycl.cpp is the core file of the entire SYCL backend (5079 lines). It implements the full ggml_backend interface for SYCL-capable devices (primarily Intel GPUs) and serves as the single entry point through which all SYCL-accelerated computation flows. Its responsibilities include:

Device initialization: ggml_sycl_init() enumerates SYCL devices via DPCT, queries hardware properties (VRAM, compute units, XMX capability, architecture), and populates the ggml_sycl_device_info structure. Device selection respects the ONEAPI_DEVICE_SELECTOR environment variable.
Backend registration: ggml_backend_sycl_reg() returns the backend registration handle, and ggml_backend_sycl_init() creates a backend instance for a specific device.
Buffer management: Implements buffer allocation (ggml_backend_sycl_buffer_type), tensor initialization, and host/device memory transfers. Supports both single-device and split-buffer modes for multi-GPU tensor distribution via ggml_backend_sycl_split_buffer_type.
Memory pools: ggml_sycl_pool_leg for device memory pooling and ggml_sycl_pool_host for pinned host memory, reducing allocation overhead.
Operation dispatch: The graph compute function iterates over compute graph nodes and dispatches each ggml_op to the appropriate kernel module -- mul_mat to mmq/mmvq/dmmv, softmax to softmax.cpp, rope to rope.cpp, normalization to norm.cpp, element-wise to element_wise.cpp, and so on.
Matrix multiplication routing: Selects between GEMM (via oneMKL for f16/f32), MMQ (quantized tiled multiply), MMVQ (quantized vector multiply), and DMMV (dequantize-then-multiply) based on batch size, quantization type, and hardware capabilities.
SYCL graph capture: Optional kernel replay optimization via SYCL graph recording when GGML_SYCL_GRAPH is enabled.
Global configuration: Reads environment variables GGML_SYCL_DEBUG, GGML_SYCL_DISABLE_OPTIMIZE, GGML_SYCL_DISABLE_GRAPH, GGML_SYCL_DISABLE_DNN, and GGML_SYCL_PRIORITIZE_DMMV.

Usage

This file is compiled as part of the SYCL backend shared library. Users interact with it through the standard GGML backend API: ggml_backend_sycl_init(device_id) to create a backend, then pass it to scheduler and graph computation functions. The backend is automatically discovered by ggml_backend_load_all() when SYCL support is compiled in.

Code Reference

Source Location

Repository: GGML
File: src/ggml-sycl/ggml-sycl.cpp
Lines: 5079

Signatures

// Device info initialization and query
static ggml_sycl_device_info ggml_sycl_init();
const ggml_sycl_device_info & ggml_sycl_info();

// Backend lifecycle
ggml_backend_t ggml_backend_sycl_init(int device);
ggml_backend_reg_t ggml_backend_sycl_reg();

// Buffer type management
ggml_backend_buffer_type_t ggml_backend_sycl_buffer_type(int device);
ggml_backend_buffer_type_t ggml_backend_sycl_split_buffer_type(const float * tensor_split);
ggml_backend_buffer_type_t ggml_backend_sycl_host_buffer_type();

// Buffer operations
static void ggml_backend_sycl_buffer_free_buffer(ggml_backend_buffer_t buffer);
static void ggml_backend_sycl_buffer_init_tensor(ggml_backend_buffer_t buffer, ggml_tensor * tensor);
static bool ggml_backend_sycl_buffer_cpy_tensor(ggml_backend_buffer_t buffer,
    const ggml_tensor * src, ggml_tensor * dst);

I/O Contract

Inputs

Name	Type	Required	Description
device	int	Yes	SYCL device index (0-based) for backend initialization
tensor_split	const float *	No	Per-device split ratios for multi-GPU tensor distribution

Outputs

Name	Type	Description
return	ggml_backend_t	Handle to the initialized SYCL backend instance
return	ggml_backend_reg_t	Backend registration handle for the plugin registry

Usage Examples

#include "ggml-backend.h"
#include "ggml-sycl.h"

// Initialize the SYCL backend on device 0
ggml_backend_t backend = ggml_backend_sycl_init(0);

// Get a buffer type for tensor allocation
ggml_backend_buffer_type_t buf_type = ggml_backend_sycl_buffer_type(0);

// Allocate a buffer and create tensors
ggml_backend_buffer_t buffer = ggml_backend_buft_alloc_buffer(buf_type, size);

// Compute a graph on the SYCL backend
ggml_backend_graph_compute(backend, graph);

// Cleanup
ggml_backend_free(backend);

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment