Implementation:Ggml org Ggml Sycl rope

Knowledge Sources	GGML
Domains	ML_Infrastructure, GPU_Compute
Last Updated	2025-05-15 12:00 GMT

Overview

SYCL rotary position embedding (RoPE) kernels supporting normal, NeoX, and multi-rope (mRoPE) variants with YaRN scaling for transformer positional encoding.

Description

rope.cpp implements the RoPE positional encoding used by most modern large language models for the SYCL backend. The file provides three kernel variants:

rope_norm: Standard RoPE applying sinusoidal rotation to pairs of adjacent elements (i, i+1) in the embedding dimension. Computes theta = position * base^(dim_index) and applies the rotation matrix [cos(theta), -sin(theta); sin(theta), cos(theta)].

rope_neox: NeoX-style RoPE where the rotation pairs elements at positions (i, i+n_dims/2) rather than adjacent positions. This interleaving pattern is used by GPT-NeoX and its derivatives.

rope_multi (mRoPE): Multi-dimensional RoPE supporting up to 4 section dimensions, where different dimension ranges use separate position indices. Used by models that apply different positional encodings to different parts of the embedding.

All variants support:

YaRN scaling: The rope_yarn function implements the YaRN (Yet another RoPE extensioN) algorithm for context length extrapolation, computing corrected scaling via ramp interpolation between extrapolated and interpolated theta values.
Frequency factors: Optional per-dimension frequency scaling factors (has_ff template parameter).
Both f32 and f16 data types: Templated on the data type T.
Strided inputs: Uses stride parameters (s1, s2) for non-contiguous tensor access.

The public entry point ggml_sycl_rope dispatches to the correct kernel variant based on the rope_type parameter (GGML_ROPE_TYPE_NORM, GGML_ROPE_TYPE_NEOX, GGML_ROPE_TYPE_MROPE, GGML_ROPE_TYPE_VISION).

Usage

Called from the main SYCL backend when the compute graph contains GGML_OP_ROPE operations. This is invoked for every attention layer in transformer inference to apply positional encoding to query and key tensors.

Code Reference

Source Location

Repository: GGML
File: src/ggml-sycl/rope.cpp
Lines: 477

Signatures

// Helper structures
struct rope_corr_dims { float v[2]; };
struct mrope_sections { int v[4]; };

// YaRN scaling
static void rope_yarn(float theta_extrap, float freq_scale, rope_corr_dims corr_dims,
    int64_t i0, float ext_factor, float mscale,
    float * cos_theta, float * sin_theta);

// Kernel variants
template <typename T, bool has_ff>
static void rope_norm(const T * x, T * dst, const int ne0, const int ne1,
    const int s1, const int s2, const int n_dims,
    const int32_t * pos, float freq_scale, float ext_factor, float attn_factor,
    const rope_corr_dims corr_dims, const float theta_scale, const float * freq_factors,
    const sycl::nd_item<3> & item_ct1);

template <typename T, bool has_ff>
static void rope_neox(const T * x, T * dst, const int ne0, const int ne1,
    const int s1, const int s2, const int n_dims, const int32_t * pos, ...);

// Public dispatch
void ggml_sycl_rope(ggml_backend_sycl_context & ctx, ggml_tensor * dst);

I/O Contract

Inputs

Name	Type	Required	Description
ctx	ggml_backend_sycl_context &	Yes	SYCL backend context providing the device queue
dst	ggml_tensor *	Yes	Destination tensor; src[0] is input, src[1] is position indices, src[2] is optional freq_factors

Outputs

Name	Type	Description
dst->data	void *	Tensor with rotary position embeddings applied (same shape as input)

Usage Examples

// Apply RoPE to query/key tensors during attention:
ggml_sycl_rope(sycl_ctx, rope_output_tensor);

// The tensor's op_params encode rope configuration:
// - rope_type (NORM, NEOX, MROPE)
// - n_dims, freq_scale, ext_factor, attn_factor
// - corr_dims for YaRN correction

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment