Implementation:Turboderp org Exllamav2 RoPE
| Knowledge Sources | |
|---|---|
| Domains | Positional_Encoding, Model_Architecture |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
The RoPE module provides functions for computing Rotary Position Embedding inverse frequency tensors, supporting multiple RoPE variants including default, Su-style, Llama 3.1, and YaRN scaling methods.
Description
Rotary Position Embedding (RoPE) encodes position information by rotating pairs of dimensions in the query and key vectors. The rotation frequencies are determined by inverse frequency tensors computed by this module. The main dispatcher function get_rope_params() selects the appropriate variant based on the model configuration's alt_rope_method field.
The supported variants are:
- get_rope_params_default() - Standard RoPE with base frequency and optional alpha scaling. Computes
inv_freq = 1 / (base ^ (2i / head_dim))with an optional scale_alpha_value adjustment to the base.
- get_rope_params_su() - Su-style (Phi-3) scaling that uses separate long and short scaling factors from the model config. When the context exceeds the original training length, it applies long factors with a logarithmic scaling multiplier:
scaling_factor = sqrt(1 + log(a/b) / log(b)).
- get_rope_params_llama3() - Llama 3.1 frequency-dependent smoothing that applies different scaling based on wavelength. High-frequency components (short wavelength) are kept unchanged, low-frequency components are divided by the scale factor, and mid-range frequencies use a smooth interpolation between the two.
- get_rope_params_yarn() - YaRN (Yet another RoPE extensioN) scaling adapted from the HuggingFace transformers implementation. Uses extrapolation for high-frequency dimensions and interpolation for low-frequency dimensions, with a linear ramp blending region determined by beta_fast and beta_slow parameters. The attention scaling factor is computed as
0.1 * log(factor) + 1.0.
All functions return a tuple of (inv_freq, scaling_factor). The dispatcher also applies rope_freq_half conversion if the architecture requires half-precision frequencies.
Usage
Use get_rope_params() when initializing attention layers to obtain the inverse frequency tensor and scaling factor for RoPE. The function is called during model loading and the resulting tensors are cached for use in every forward pass. The appropriate variant is selected automatically from the model's configuration.
Code Reference
Source Location
- Repository: Turboderp_org_Exllamav2
- File: exllamav2/rope.py
- Lines: 1-178
Signature
def get_rope_params(
device: torch.Device,
cfg: ExLlamaV2Config,
base: float,
) -> tuple[torch.Tensor, float]: ...
def get_rope_params_default(
device: torch.Device,
cfg: ExLlamaV2Config,
base: float,
) -> tuple[torch.Tensor, float]: ...
def get_rope_params_su(
device: torch.Device,
cfg: ExLlamaV2Config,
base: float,
) -> tuple[torch.Tensor, float]: ...
def get_rope_params_llama3(
device: torch.Device,
cfg: ExLlamaV2Config,
base: float,
) -> tuple[torch.Tensor, float]: ...
def get_rope_params_yarn(
device: torch.Device,
cfg: ExLlamaV2Config,
base: float,
) -> tuple[torch.Tensor, float]: ...
Import
from exllamav2.rope import get_rope_params
I/O Contract
get_rope_params()
| Parameter | Type | Description |
|---|---|---|
| device | torch.Device |
Target device for the returned inverse frequency tensor |
| cfg | ExLlamaV2Config |
Model configuration containing head_dim, max_seq_len, alt_rope_method, scale_alpha_value, and variant-specific fields |
| base | float |
Base frequency for computing inverse frequencies (typically 10000.0) |
| Return | Type | Description |
|---|---|---|
| inv_freq | torch.Tensor |
Inverse frequency tensor of shape (head_dim // 2,) used for rotation
|
| scaling_factor | float |
Attention scaling factor (1.0 for no scaling) |
Config Fields by Variant
| Variant | Config Fields Used |
|---|---|
| default | head_dim, partial_rotary_factor, scale_alpha_value |
| su | head_dim, partial_rotary_factor, scale_alpha_value, max_seq_len, original_max_seq_len, scale_long_factor, scale_short_factor |
| llama3 | head_dim, partial_rotary_factor, scale_alpha_value, l3_rope_factor, l3_rope_low_freq_factor, l3_rope_high_freq_factor, l3_rope_original_max_position_embeddings |
| yarn | head_dim, partial_rotary_factor, scale_alpha_value, max_seq_len, yarn_rope_original_max_position_embeddings, yarn_rope_factor |
Usage Examples
from exllamav2.rope import get_rope_params
# Compute RoPE parameters for the model's configured variant
inv_freq, scaling_factor = get_rope_params(
device=torch.device("cuda:0"),
cfg=model.config,
base=10000.0,
)
print(f"inv_freq shape: {inv_freq.shape}") # e.g. (64,) for head_dim=128
print(f"scaling_factor: {scaling_factor}") # e.g. 1.0
# The inv_freq tensor is used to compute rotation angles:
# t = torch.arange(seq_len, device=device)
# freqs = torch.outer(t, inv_freq)
# emb = torch.cat((freqs, freqs), dim=-1)
# cos, sin = emb.cos(), emb.sin()
Related Pages
- Turboderp_org_Exllamav2_ExLlamaV2PosEmbedding - Learned positional embeddings (alternative to RoPE)
- Turboderp_org_Exllamav2_ExLlamaV2ParallelDecoder - Decoder block whose attention sublayer uses RoPE
- Turboderp_org_Exllamav2_ExLlamaV2RMSNorm - Normalization applied before the attention layer that uses RoPE