Implementation:Turboderp org Exllamav2 RoPE

Knowledge Sources	Turboderp_org_Exllamav2
Domains	Positional_Encoding, Model_Architecture
Last Updated	2026-02-15 00:00 GMT

Overview

The RoPE module provides functions for computing Rotary Position Embedding inverse frequency tensors, supporting multiple RoPE variants including default, Su-style, Llama 3.1, and YaRN scaling methods.

Description

Rotary Position Embedding (RoPE) encodes position information by rotating pairs of dimensions in the query and key vectors. The rotation frequencies are determined by inverse frequency tensors computed by this module. The main dispatcher function get_rope_params() selects the appropriate variant based on the model configuration's alt_rope_method field.

The supported variants are:

get_rope_params_default() - Standard RoPE with base frequency and optional alpha scaling. Computes inv_freq = 1 / (base ^ (2i / head_dim)) with an optional scale_alpha_value adjustment to the base.

get_rope_params_su() - Su-style (Phi-3) scaling that uses separate long and short scaling factors from the model config. When the context exceeds the original training length, it applies long factors with a logarithmic scaling multiplier: scaling_factor = sqrt(1 + log(a/b) / log(b)).

get_rope_params_llama3() - Llama 3.1 frequency-dependent smoothing that applies different scaling based on wavelength. High-frequency components (short wavelength) are kept unchanged, low-frequency components are divided by the scale factor, and mid-range frequencies use a smooth interpolation between the two.

get_rope_params_yarn() - YaRN (Yet another RoPE extensioN) scaling adapted from the HuggingFace transformers implementation. Uses extrapolation for high-frequency dimensions and interpolation for low-frequency dimensions, with a linear ramp blending region determined by beta_fast and beta_slow parameters. The attention scaling factor is computed as 0.1 * log(factor) + 1.0.

All functions return a tuple of (inv_freq, scaling_factor). The dispatcher also applies rope_freq_half conversion if the architecture requires half-precision frequencies.

Usage

Use get_rope_params() when initializing attention layers to obtain the inverse frequency tensor and scaling factor for RoPE. The function is called during model loading and the resulting tensors are cached for use in every forward pass. The appropriate variant is selected automatically from the model's configuration.

Code Reference

Source Location

Repository: Turboderp_org_Exllamav2
File: exllamav2/rope.py
Lines: 1-178

Signature

def get_rope_params(
    device: torch.Device,
    cfg: ExLlamaV2Config,
    base: float,
) -> tuple[torch.Tensor, float]: ...

def get_rope_params_default(
    device: torch.Device,
    cfg: ExLlamaV2Config,
    base: float,
) -> tuple[torch.Tensor, float]: ...

def get_rope_params_su(
    device: torch.Device,
    cfg: ExLlamaV2Config,
    base: float,
) -> tuple[torch.Tensor, float]: ...

def get_rope_params_llama3(
    device: torch.Device,
    cfg: ExLlamaV2Config,
    base: float,
) -> tuple[torch.Tensor, float]: ...

def get_rope_params_yarn(
    device: torch.Device,
    cfg: ExLlamaV2Config,
    base: float,
) -> tuple[torch.Tensor, float]: ...

Import

from exllamav2.rope import get_rope_params

I/O Contract

get_rope_params()

Parameter	Type	Description
device	`torch.Device`	Target device for the returned inverse frequency tensor
cfg	`ExLlamaV2Config`	Model configuration containing head_dim, max_seq_len, alt_rope_method, scale_alpha_value, and variant-specific fields
base	`float`	Base frequency for computing inverse frequencies (typically 10000.0)

Return	Type	Description
inv_freq	`torch.Tensor`	Inverse frequency tensor of shape `(head_dim // 2,)` used for rotation
scaling_factor	`float`	Attention scaling factor (1.0 for no scaling)

Config Fields by Variant

Variant	Config Fields Used
default	head_dim, partial_rotary_factor, scale_alpha_value
su	head_dim, partial_rotary_factor, scale_alpha_value, max_seq_len, original_max_seq_len, scale_long_factor, scale_short_factor
llama3	head_dim, partial_rotary_factor, scale_alpha_value, l3_rope_factor, l3_rope_low_freq_factor, l3_rope_high_freq_factor, l3_rope_original_max_position_embeddings
yarn	head_dim, partial_rotary_factor, scale_alpha_value, max_seq_len, yarn_rope_original_max_position_embeddings, yarn_rope_factor

Usage Examples

from exllamav2.rope import get_rope_params

# Compute RoPE parameters for the model's configured variant
inv_freq, scaling_factor = get_rope_params(
    device=torch.device("cuda:0"),
    cfg=model.config,
    base=10000.0,
)

print(f"inv_freq shape: {inv_freq.shape}")      # e.g. (64,) for head_dim=128
print(f"scaling_factor: {scaling_factor}")       # e.g. 1.0

# The inv_freq tensor is used to compute rotation angles:
# t = torch.arange(seq_len, device=device)
# freqs = torch.outer(t, inv_freq)
# emb = torch.cat((freqs, freqs), dim=-1)
# cos, sin = emb.cos(), emb.sin()

Related Pages

Turboderp_org_Exllamav2_ExLlamaV2PosEmbedding - Learned positional embeddings (alternative to RoPE)
Turboderp_org_Exllamav2_ExLlamaV2ParallelDecoder - Decoder block whose attention sublayer uses RoPE
Turboderp_org_Exllamav2_ExLlamaV2RMSNorm - Normalization applied before the attention layer that uses RoPE

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment