Implementation:Ggml org Ggml Opencl backend

Metadata

Field	Value
Page Type	Implementation (API Doc)
Knowledge Sources	GGML
Domains	ML_Infrastructure, Tensor_Computing, GPU_Computing
Last Updated	2025-05-15 12:00 GMT

Overview

Main implementation of the OpenCL GPU backend, providing GPU acceleration with special optimizations for Qualcomm Adreno and Intel GPUs.

Description

ggml-opencl.cpp implements the full GGML backend interface over OpenCL in approximately 11,000 lines. The key components include:

GPU detection and feature probing: Detects the GPU family (Adreno, Intel, or Unknown) and Adreno generation (A7X, A8X, X1E) for vendor-specific kernel selection. Parses OpenCL version strings and compiler versions for feature detection.
Fast integer division: Implements a Granlund-Montgomery algorithm optimization (fastdiv_vals) that precomputes multiplier and shift values to replace expensive integer division with multiply-and-shift in GPU kernels. This is critical for efficient stride calculations.
Kernel management: Manages OpenCL kernel compilation from either embedded string literals or external .cl files. Kernels are compiled with vendor-specific options based on the detected GPU family and compiler version.
Quantized tensor support: Supports specialized tensor formats for quantized types (q4_0, q8_0, q6_K, mxfp4) with Structure of Arrays (SoA) layout for better GPU memory access patterns on mobile GPUs.
Backend interface: Implements all standard backend callbacks including buffer management (ggml_cl_buffer), tensor operations, graph computation, and device memory queries.
Profiling: Includes a ProfilingInfo class for measuring kernel execution times.

The Adreno-specific optimizations make this backend particularly valuable for on-device inference on Android smartphones, while Intel GPU support covers integrated graphics on laptops and desktops.

Usage

Users initialize the OpenCL backend by calling ggml_backend_opencl_init(). The backend is typically discovered automatically by ggml_backend_load_all(). It supports both device buffers and host-pinned buffers for faster CPU-GPU transfers.

Code Reference

Source Location

GGML repo, file: src/ggml-opencl/ggml-opencl.cpp (11046 lines).

Signatures

ggml_backend_t ggml_backend_opencl_init(void);
bool ggml_backend_is_opencl(ggml_backend_t backend);
ggml_backend_buffer_type_t ggml_backend_opencl_buffer_type(void);
ggml_backend_buffer_type_t ggml_backend_opencl_host_buffer_type(void);
ggml_backend_reg_t ggml_backend_opencl_reg(void);

// Internal:
bool ggml_cl_compute_forward(ggml_backend_t backend, struct ggml_tensor * tensor);

Import

#include "ggml-opencl.h"

I/O Contract

Inputs

Parameter	Type	Required	Description
(none for init)	--	--	`ggml_backend_opencl_init` takes no parameters; it discovers and initializes the best available OpenCL device.
`backend`	`ggml_backend_t`	Yes	Backend handle for type-checking with `ggml_backend_is_opencl`.
`tensor`	`struct ggml_tensor *`	Yes	Tensor to compute in `ggml_cl_compute_forward`.

Outputs

Output	Type	Description
Backend handle	`ggml_backend_t`	Opaque handle to the initialized OpenCL backend.
Buffer type	`ggml_backend_buffer_type_t`	Buffer type for OpenCL device memory or host-pinned memory.
Registration handle	`ggml_backend_reg_t`	Backend registration for the auto-discovery system.

Usage Examples

#include "ggml-opencl.h"
#include "ggml-backend.h"

// Initialize the OpenCL backend
ggml_backend_t ocl_backend = ggml_backend_opencl_init();

// Check if initialization was successful
if (ocl_backend && ggml_backend_is_opencl(ocl_backend)) {
    // Use with scheduler
    ggml_backend_sched_t sched = ggml_backend_sched_new(
        &ocl_backend, NULL, 1, GGML_DEFAULT_GRAPH_SIZE, false);

    ggml_backend_sched_graph_compute(sched, graph);

    ggml_backend_sched_free(sched);
    ggml_backend_free(ocl_backend);
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment