Implementation:Ggml org Ggml Opencl backend
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation (API Doc) |
| Knowledge Sources | GGML |
| Domains | ML_Infrastructure, Tensor_Computing, GPU_Computing |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
Main implementation of the OpenCL GPU backend, providing GPU acceleration with special optimizations for Qualcomm Adreno and Intel GPUs.
Description
ggml-opencl.cpp implements the full GGML backend interface over OpenCL in approximately 11,000 lines. The key components include:
- GPU detection and feature probing: Detects the GPU family (Adreno, Intel, or Unknown) and Adreno generation (A7X, A8X, X1E) for vendor-specific kernel selection. Parses OpenCL version strings and compiler versions for feature detection.
- Fast integer division: Implements a Granlund-Montgomery algorithm optimization (
fastdiv_vals) that precomputes multiplier and shift values to replace expensive integer division with multiply-and-shift in GPU kernels. This is critical for efficient stride calculations. - Kernel management: Manages OpenCL kernel compilation from either embedded string literals or external
.clfiles. Kernels are compiled with vendor-specific options based on the detected GPU family and compiler version. - Quantized tensor support: Supports specialized tensor formats for quantized types (q4_0, q8_0, q6_K, mxfp4) with Structure of Arrays (SoA) layout for better GPU memory access patterns on mobile GPUs.
- Backend interface: Implements all standard backend callbacks including buffer management (
ggml_cl_buffer), tensor operations, graph computation, and device memory queries. - Profiling: Includes a
ProfilingInfoclass for measuring kernel execution times.
The Adreno-specific optimizations make this backend particularly valuable for on-device inference on Android smartphones, while Intel GPU support covers integrated graphics on laptops and desktops.
Usage
Users initialize the OpenCL backend by calling ggml_backend_opencl_init(). The backend is typically discovered automatically by ggml_backend_load_all(). It supports both device buffers and host-pinned buffers for faster CPU-GPU transfers.
Code Reference
Source Location
GGML repo, file: src/ggml-opencl/ggml-opencl.cpp (11046 lines).
Signatures
ggml_backend_t ggml_backend_opencl_init(void);
bool ggml_backend_is_opencl(ggml_backend_t backend);
ggml_backend_buffer_type_t ggml_backend_opencl_buffer_type(void);
ggml_backend_buffer_type_t ggml_backend_opencl_host_buffer_type(void);
ggml_backend_reg_t ggml_backend_opencl_reg(void);
// Internal:
bool ggml_cl_compute_forward(ggml_backend_t backend, struct ggml_tensor * tensor);
Import
#include "ggml-opencl.h"
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| (none for init) | -- | -- | ggml_backend_opencl_init takes no parameters; it discovers and initializes the best available OpenCL device.
|
backend |
ggml_backend_t |
Yes | Backend handle for type-checking with ggml_backend_is_opencl.
|
tensor |
struct ggml_tensor * |
Yes | Tensor to compute in ggml_cl_compute_forward.
|
Outputs
| Output | Type | Description |
|---|---|---|
| Backend handle | ggml_backend_t |
Opaque handle to the initialized OpenCL backend. |
| Buffer type | ggml_backend_buffer_type_t |
Buffer type for OpenCL device memory or host-pinned memory. |
| Registration handle | ggml_backend_reg_t |
Backend registration for the auto-discovery system. |
Usage Examples
#include "ggml-opencl.h"
#include "ggml-backend.h"
// Initialize the OpenCL backend
ggml_backend_t ocl_backend = ggml_backend_opencl_init();
// Check if initialization was successful
if (ocl_backend && ggml_backend_is_opencl(ocl_backend)) {
// Use with scheduler
ggml_backend_sched_t sched = ggml_backend_sched_new(
&ocl_backend, NULL, 1, GGML_DEFAULT_GRAPH_SIZE, false);
ggml_backend_sched_graph_compute(sched, graph);
ggml_backend_sched_free(sched);
ggml_backend_free(ocl_backend);
}