Implementation:Ggml org Ggml Sycl rope
| Knowledge Sources | |
|---|---|
| Domains | ML_Infrastructure, GPU_Compute |
| Last Updated | 2025-05-15 12:00 GMT |
Overview
SYCL rotary position embedding (RoPE) kernels supporting normal, NeoX, and multi-rope (mRoPE) variants with YaRN scaling for transformer positional encoding.
Description
rope.cpp implements the RoPE positional encoding used by most modern large language models for the SYCL backend. The file provides three kernel variants:
- rope_norm: Standard RoPE applying sinusoidal rotation to pairs of adjacent elements (i, i+1) in the embedding dimension. Computes theta = position * base^(dim_index) and applies the rotation matrix [cos(theta), -sin(theta); sin(theta), cos(theta)].
- rope_neox: NeoX-style RoPE where the rotation pairs elements at positions (i, i+n_dims/2) rather than adjacent positions. This interleaving pattern is used by GPT-NeoX and its derivatives.
- rope_multi (mRoPE): Multi-dimensional RoPE supporting up to 4 section dimensions, where different dimension ranges use separate position indices. Used by models that apply different positional encodings to different parts of the embedding.
All variants support:
- YaRN scaling: The rope_yarn function implements the YaRN (Yet another RoPE extensioN) algorithm for context length extrapolation, computing corrected scaling via ramp interpolation between extrapolated and interpolated theta values.
- Frequency factors: Optional per-dimension frequency scaling factors (has_ff template parameter).
- Both f32 and f16 data types: Templated on the data type T.
- Strided inputs: Uses stride parameters (s1, s2) for non-contiguous tensor access.
The public entry point ggml_sycl_rope dispatches to the correct kernel variant based on the rope_type parameter (GGML_ROPE_TYPE_NORM, GGML_ROPE_TYPE_NEOX, GGML_ROPE_TYPE_MROPE, GGML_ROPE_TYPE_VISION).
Usage
Called from the main SYCL backend when the compute graph contains GGML_OP_ROPE operations. This is invoked for every attention layer in transformer inference to apply positional encoding to query and key tensors.
Code Reference
Source Location
- Repository: GGML
- File: src/ggml-sycl/rope.cpp
- Lines: 477
Signatures
// Helper structures
struct rope_corr_dims { float v[2]; };
struct mrope_sections { int v[4]; };
// YaRN scaling
static void rope_yarn(float theta_extrap, float freq_scale, rope_corr_dims corr_dims,
int64_t i0, float ext_factor, float mscale,
float * cos_theta, float * sin_theta);
// Kernel variants
template <typename T, bool has_ff>
static void rope_norm(const T * x, T * dst, const int ne0, const int ne1,
const int s1, const int s2, const int n_dims,
const int32_t * pos, float freq_scale, float ext_factor, float attn_factor,
const rope_corr_dims corr_dims, const float theta_scale, const float * freq_factors,
const sycl::nd_item<3> & item_ct1);
template <typename T, bool has_ff>
static void rope_neox(const T * x, T * dst, const int ne0, const int ne1,
const int s1, const int s2, const int n_dims, const int32_t * pos, ...);
// Public dispatch
void ggml_sycl_rope(ggml_backend_sycl_context & ctx, ggml_tensor * dst);
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ctx | ggml_backend_sycl_context & | Yes | SYCL backend context providing the device queue |
| dst | ggml_tensor * | Yes | Destination tensor; src[0] is input, src[1] is position indices, src[2] is optional freq_factors |
Outputs
| Name | Type | Description |
|---|---|---|
| dst->data | void * | Tensor with rotary position embeddings applied (same shape as input) |
Usage Examples
// Apply RoPE to query/key tensors during attention:
ggml_sycl_rope(sycl_ctx, rope_output_tensor);
// The tensor's op_params encode rope configuration:
// - rope_type (NORM, NEOX, MROPE)
// - n_dims, freq_scale, ext_factor, attn_factor
// - corr_dims for YaRN correction