Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Sycl element wise

From Leeroopedia


Knowledge Sources
Domains ML_Infrastructure, GPU_Compute
Last Updated 2025-05-15 12:00 GMT

Overview

SYCL element-wise operation kernels implementing the full set of activation functions, unary operations, GLU variants, and miscellaneous per-element transformations for neural network inference.

Description

element_wise.cpp is one of the largest kernel files in the SYCL backend, implementing the broad family of element-wise operations required for transformer inference. The architecture uses templated operation functions dispatched through generic kernel launchers:

  • Activation functions: op_gelu (GELU with tanh approximation), op_gelu_erf (exact GELU using error function), op_gelu_quick, op_silu (SiLU/Swish), op_relu, op_sigmoid, op_hardsigmoid, op_hardswish, op_softplus, op_tanh, op_elu, op_leaky_relu
  • Unary math operations: op_sgn, op_abs, op_neg, op_step, op_sqrt, op_sqr, op_sin, op_cos, op_exp, op_log, op_floor, op_ceil, op_round, op_trunc
  • GLU (Gated Linear Unit) variants: geglu, geglu_erf, geglu_quick, reglu, swiglu, swiglu_oai -- each combining a gate activation with element-wise multiplication
  • Other operations: acc_f32 (accumulate with offset), upscale, clamp, arange

The implementation uses two key macros -- SYCL_GLOBAL_ID_LOOP for work-item iteration and SYCL_LOCAL_ID_CALC for index computation -- and dispatches operations through sycl::parallel_for with configurable block sizes from the presets module.

Usage

Called from the main SYCL backend dispatcher for all unary and element-wise operations in the compute graph. Each public function (e.g., ggml_sycl_gelu, ggml_sycl_silu) extracts tensor data pointers, determines launch configuration, and invokes the appropriate templated kernel.

Code Reference

Source Location

  • Repository: GGML
  • File: src/ggml-sycl/element_wise.cpp
  • Lines: 1216

Signatures

// Representative activation function templates
template<typename T> static __dpct_inline__ T op_gelu(T x);
template<typename T> static __dpct_inline__ T op_silu(T x);
template<typename T> static __dpct_inline__ T op_relu(T x);
template<typename T> static __dpct_inline__ T op_sigmoid(T x);

// Public entry points (selected subset)
void ggml_sycl_gelu(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_silu(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_relu(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_sigmoid(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_tanh(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_exp(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_log(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_sqrt(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_sqr(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_acc(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_upscale(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_clamp(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_arange(ggml_backend_sycl_context & ctx, ggml_tensor * dst);

// GLU variants
void ggml_sycl_geglu(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_reglu(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_swiglu(ggml_backend_sycl_context & ctx, ggml_tensor * dst);

I/O Contract

Inputs

Name Type Required Description
ctx ggml_backend_sycl_context & Yes SYCL backend context providing the device queue
dst ggml_tensor * Yes Destination tensor; source data accessed via dst->src[0] (and dst->src[1] for binary ops)

Outputs

Name Type Description
dst->data void * Result tensor with the element-wise operation applied

Usage Examples

// Applying GELU activation to a tensor:
ggml_sycl_gelu(sycl_ctx, gelu_output_tensor);

// Applying SiLU (Swish) activation:
ggml_sycl_silu(sycl_ctx, silu_output_tensor);

// Clamping values to a range:
ggml_sycl_clamp(sycl_ctx, clamp_output_tensor);

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment