Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Sycl norm

From Leeroopedia


Knowledge Sources
Domains ML_Infrastructure, GPU_Compute
Last Updated 2025-05-15 12:00 GMT

Overview

SYCL normalization kernels implementing LayerNorm, RMSNorm, GroupNorm, L2Norm, and RMSNorm backward pass for transformer inference.

Description

norm.cpp implements the essential normalization operations used in transformer layers for the SYCL backend. The file provides five normalization variants:

  • norm_f32 (LayerNorm): Computes per-row mean and variance using a two-pass approach (accumulate sum and sum-of-squares in a single pass via sycl::float2), then normalizes each element as (x - mean) * rsqrt(var + eps). Supports strided (non-contiguous) inputs via calculate_offset helpers.
  • group_norm_f32: Computes statistics over groups of elements spanning multiple rows, used in convolutional architectures. Accumulates group mean and variance, then normalizes.
  • rms_norm_f32 (RMSNorm): Computes root-mean-square normalization without mean subtraction: x * rsqrt(mean(x^2) + eps). This is the normalization used by LLaMA and most modern LLMs.
  • rms_norm_back_f32: Gradient computation for RMSNorm backward pass, used during fine-tuning.
  • l2_norm_f32: L2 normalization that divides each element by the L2 norm of the row, with epsilon for numerical stability.

All kernels use warp-level reductions (warp_reduce_sum) for efficient parallel reduction within subgroups, with optional shared memory cross-warp reduction for work-groups larger than one warp. Block sizes are dynamically selected from the presets module based on the number of columns.

Usage

Called from the main SYCL backend when the compute graph contains normalization operations (GGML_OP_NORM, GGML_OP_RMS_NORM, GGML_OP_GROUP_NORM, GGML_OP_L2_NORM, GGML_OP_RMS_NORM_BACK).

Code Reference

Source Location

  • Repository: GGML
  • File: src/ggml-sycl/norm.cpp
  • Lines: 654

Signatures

// Core kernel functions
static void norm_f32(const float* x, float* dst, const int ncols,
    const int64_t stride_row, const int64_t stride_channel, const int64_t stride_sample,
    const float eps, const sycl::nd_item<3>& item_ct1, sycl::float2* s_sum, int block_size);

static void group_norm_f32(const float* x, float* dst, const int group_size,
    const int ne_elements, const float eps,
    const sycl::nd_item<3>& item_ct1, float* s_sum, int block_size);

static void rms_norm_f32(const float* x, float* dst, const int ncols, const float eps, ...);

// Public dispatch functions
void ggml_sycl_op_norm(ggml_backend_sycl_context& ctx, ggml_tensor* dst);
void ggml_sycl_op_group_norm(ggml_backend_sycl_context& ctx, ggml_tensor* dst);
void ggml_sycl_op_rms_norm(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_op_rms_norm_back(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
void ggml_sycl_op_l2_norm(ggml_backend_sycl_context& ctx, ggml_tensor* dst);

I/O Contract

Inputs

Name Type Required Description
ctx ggml_backend_sycl_context & Yes SYCL backend context providing the device queue
dst ggml_tensor * Yes Destination tensor; source accessed via dst->src[0], epsilon from dst->op_params

Outputs

Name Type Description
dst->data void * Normalized output tensor (f32)

Usage Examples

// RMSNorm for transformer layer normalization:
ggml_sycl_op_rms_norm(sycl_ctx, rms_norm_output);

// LayerNorm:
ggml_sycl_op_norm(sycl_ctx, norm_output);

// Group normalization for convolutional layers:
ggml_sycl_op_group_norm(sycl_ctx, group_norm_output);

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment