Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Opencl backend

From Leeroopedia


Metadata

Field Value
Page Type Implementation (API Doc)
Knowledge Sources GGML
Domains ML_Infrastructure, Tensor_Computing, GPU_Computing
Last Updated 2025-05-15 12:00 GMT

Overview

Main implementation of the OpenCL GPU backend, providing GPU acceleration with special optimizations for Qualcomm Adreno and Intel GPUs.

Description

ggml-opencl.cpp implements the full GGML backend interface over OpenCL in approximately 11,000 lines. The key components include:

  1. GPU detection and feature probing: Detects the GPU family (Adreno, Intel, or Unknown) and Adreno generation (A7X, A8X, X1E) for vendor-specific kernel selection. Parses OpenCL version strings and compiler versions for feature detection.
  2. Fast integer division: Implements a Granlund-Montgomery algorithm optimization (fastdiv_vals) that precomputes multiplier and shift values to replace expensive integer division with multiply-and-shift in GPU kernels. This is critical for efficient stride calculations.
  3. Kernel management: Manages OpenCL kernel compilation from either embedded string literals or external .cl files. Kernels are compiled with vendor-specific options based on the detected GPU family and compiler version.
  4. Quantized tensor support: Supports specialized tensor formats for quantized types (q4_0, q8_0, q6_K, mxfp4) with Structure of Arrays (SoA) layout for better GPU memory access patterns on mobile GPUs.
  5. Backend interface: Implements all standard backend callbacks including buffer management (ggml_cl_buffer), tensor operations, graph computation, and device memory queries.
  6. Profiling: Includes a ProfilingInfo class for measuring kernel execution times.

The Adreno-specific optimizations make this backend particularly valuable for on-device inference on Android smartphones, while Intel GPU support covers integrated graphics on laptops and desktops.

Usage

Users initialize the OpenCL backend by calling ggml_backend_opencl_init(). The backend is typically discovered automatically by ggml_backend_load_all(). It supports both device buffers and host-pinned buffers for faster CPU-GPU transfers.

Code Reference

Source Location

GGML repo, file: src/ggml-opencl/ggml-opencl.cpp (11046 lines).

Signatures

ggml_backend_t ggml_backend_opencl_init(void);
bool ggml_backend_is_opencl(ggml_backend_t backend);
ggml_backend_buffer_type_t ggml_backend_opencl_buffer_type(void);
ggml_backend_buffer_type_t ggml_backend_opencl_host_buffer_type(void);
ggml_backend_reg_t ggml_backend_opencl_reg(void);

// Internal:
bool ggml_cl_compute_forward(ggml_backend_t backend, struct ggml_tensor * tensor);

Import

#include "ggml-opencl.h"

I/O Contract

Inputs

Parameter Type Required Description
(none for init) -- -- ggml_backend_opencl_init takes no parameters; it discovers and initializes the best available OpenCL device.
backend ggml_backend_t Yes Backend handle for type-checking with ggml_backend_is_opencl.
tensor struct ggml_tensor * Yes Tensor to compute in ggml_cl_compute_forward.

Outputs

Output Type Description
Backend handle ggml_backend_t Opaque handle to the initialized OpenCL backend.
Buffer type ggml_backend_buffer_type_t Buffer type for OpenCL device memory or host-pinned memory.
Registration handle ggml_backend_reg_t Backend registration for the auto-discovery system.

Usage Examples

#include "ggml-opencl.h"
#include "ggml-backend.h"

// Initialize the OpenCL backend
ggml_backend_t ocl_backend = ggml_backend_opencl_init();

// Check if initialization was successful
if (ocl_backend && ggml_backend_is_opencl(ocl_backend)) {
    // Use with scheduler
    ggml_backend_sched_t sched = ggml_backend_sched_new(
        &ocl_backend, NULL, 1, GGML_DEFAULT_GRAPH_SIZE, false);

    ggml_backend_sched_graph_compute(sched, graph);

    ggml_backend_sched_free(sched);
    ggml_backend_free(ocl_backend);
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment