Implementation:Ggml org Ggml Sycl backend
| Knowledge Sources | |
|---|---|
| Domains | ML_Infrastructure, GPU_Compute, Hardware_Abstraction |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
Main SYCL backend implementation -- the central orchestration file that registers the SYCL backend with GGML and dispatches all tensor operations to specialized SYCL kernel modules.
Description
ggml-sycl.cpp is the core file of the entire SYCL backend (5079 lines). It implements the full ggml_backend interface for SYCL-capable devices (primarily Intel GPUs) and serves as the single entry point through which all SYCL-accelerated computation flows. Its responsibilities include:
- Device initialization: ggml_sycl_init() enumerates SYCL devices via DPCT, queries hardware properties (VRAM, compute units, XMX capability, architecture), and populates the ggml_sycl_device_info structure. Device selection respects the ONEAPI_DEVICE_SELECTOR environment variable.
- Backend registration: ggml_backend_sycl_reg() returns the backend registration handle, and ggml_backend_sycl_init() creates a backend instance for a specific device.
- Buffer management: Implements buffer allocation (ggml_backend_sycl_buffer_type), tensor initialization, and host/device memory transfers. Supports both single-device and split-buffer modes for multi-GPU tensor distribution via ggml_backend_sycl_split_buffer_type.
- Memory pools: ggml_sycl_pool_leg for device memory pooling and ggml_sycl_pool_host for pinned host memory, reducing allocation overhead.
- Operation dispatch: The graph compute function iterates over compute graph nodes and dispatches each ggml_op to the appropriate kernel module -- mul_mat to mmq/mmvq/dmmv, softmax to softmax.cpp, rope to rope.cpp, normalization to norm.cpp, element-wise to element_wise.cpp, and so on.
- Matrix multiplication routing: Selects between GEMM (via oneMKL for f16/f32), MMQ (quantized tiled multiply), MMVQ (quantized vector multiply), and DMMV (dequantize-then-multiply) based on batch size, quantization type, and hardware capabilities.
- SYCL graph capture: Optional kernel replay optimization via SYCL graph recording when GGML_SYCL_GRAPH is enabled.
- Global configuration: Reads environment variables GGML_SYCL_DEBUG, GGML_SYCL_DISABLE_OPTIMIZE, GGML_SYCL_DISABLE_GRAPH, GGML_SYCL_DISABLE_DNN, and GGML_SYCL_PRIORITIZE_DMMV.
Usage
This file is compiled as part of the SYCL backend shared library. Users interact with it through the standard GGML backend API: ggml_backend_sycl_init(device_id) to create a backend, then pass it to scheduler and graph computation functions. The backend is automatically discovered by ggml_backend_load_all() when SYCL support is compiled in.
Code Reference
Source Location
- Repository: GGML
- File: src/ggml-sycl/ggml-sycl.cpp
- Lines: 5079
Signatures
// Device info initialization and query
static ggml_sycl_device_info ggml_sycl_init();
const ggml_sycl_device_info & ggml_sycl_info();
// Backend lifecycle
ggml_backend_t ggml_backend_sycl_init(int device);
ggml_backend_reg_t ggml_backend_sycl_reg();
// Buffer type management
ggml_backend_buffer_type_t ggml_backend_sycl_buffer_type(int device);
ggml_backend_buffer_type_t ggml_backend_sycl_split_buffer_type(const float * tensor_split);
ggml_backend_buffer_type_t ggml_backend_sycl_host_buffer_type();
// Buffer operations
static void ggml_backend_sycl_buffer_free_buffer(ggml_backend_buffer_t buffer);
static void ggml_backend_sycl_buffer_init_tensor(ggml_backend_buffer_t buffer, ggml_tensor * tensor);
static bool ggml_backend_sycl_buffer_cpy_tensor(ggml_backend_buffer_t buffer,
const ggml_tensor * src, ggml_tensor * dst);
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| device | int | Yes | SYCL device index (0-based) for backend initialization |
| tensor_split | const float * | No | Per-device split ratios for multi-GPU tensor distribution |
Outputs
| Name | Type | Description |
|---|---|---|
| return | ggml_backend_t | Handle to the initialized SYCL backend instance |
| return | ggml_backend_reg_t | Backend registration handle for the plugin registry |
Usage Examples
#include "ggml-backend.h"
#include "ggml-sycl.h"
// Initialize the SYCL backend on device 0
ggml_backend_t backend = ggml_backend_sycl_init(0);
// Get a buffer type for tensor allocation
ggml_backend_buffer_type_t buf_type = ggml_backend_sycl_buffer_type(0);
// Allocate a buffer and create tensors
ggml_backend_buffer_t buffer = ggml_backend_buft_alloc_buffer(buf_type, size);
// Compute a graph on the SYCL backend
ggml_backend_graph_compute(backend, graph);
// Cleanup
ggml_backend_free(backend);