Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Turboderp org Exllamav2 RoPE

From Leeroopedia
Knowledge Sources
Domains Positional_Encoding, Model_Architecture
Last Updated 2026-02-15 00:00 GMT

Overview

The RoPE module provides functions for computing Rotary Position Embedding inverse frequency tensors, supporting multiple RoPE variants including default, Su-style, Llama 3.1, and YaRN scaling methods.

Description

Rotary Position Embedding (RoPE) encodes position information by rotating pairs of dimensions in the query and key vectors. The rotation frequencies are determined by inverse frequency tensors computed by this module. The main dispatcher function get_rope_params() selects the appropriate variant based on the model configuration's alt_rope_method field.

The supported variants are:

  • get_rope_params_default() - Standard RoPE with base frequency and optional alpha scaling. Computes inv_freq = 1 / (base ^ (2i / head_dim)) with an optional scale_alpha_value adjustment to the base.
  • get_rope_params_su() - Su-style (Phi-3) scaling that uses separate long and short scaling factors from the model config. When the context exceeds the original training length, it applies long factors with a logarithmic scaling multiplier: scaling_factor = sqrt(1 + log(a/b) / log(b)).
  • get_rope_params_llama3() - Llama 3.1 frequency-dependent smoothing that applies different scaling based on wavelength. High-frequency components (short wavelength) are kept unchanged, low-frequency components are divided by the scale factor, and mid-range frequencies use a smooth interpolation between the two.
  • get_rope_params_yarn() - YaRN (Yet another RoPE extensioN) scaling adapted from the HuggingFace transformers implementation. Uses extrapolation for high-frequency dimensions and interpolation for low-frequency dimensions, with a linear ramp blending region determined by beta_fast and beta_slow parameters. The attention scaling factor is computed as 0.1 * log(factor) + 1.0.

All functions return a tuple of (inv_freq, scaling_factor). The dispatcher also applies rope_freq_half conversion if the architecture requires half-precision frequencies.

Usage

Use get_rope_params() when initializing attention layers to obtain the inverse frequency tensor and scaling factor for RoPE. The function is called during model loading and the resulting tensors are cached for use in every forward pass. The appropriate variant is selected automatically from the model's configuration.

Code Reference

Source Location

Signature

def get_rope_params(
    device: torch.Device,
    cfg: ExLlamaV2Config,
    base: float,
) -> tuple[torch.Tensor, float]: ...

def get_rope_params_default(
    device: torch.Device,
    cfg: ExLlamaV2Config,
    base: float,
) -> tuple[torch.Tensor, float]: ...

def get_rope_params_su(
    device: torch.Device,
    cfg: ExLlamaV2Config,
    base: float,
) -> tuple[torch.Tensor, float]: ...

def get_rope_params_llama3(
    device: torch.Device,
    cfg: ExLlamaV2Config,
    base: float,
) -> tuple[torch.Tensor, float]: ...

def get_rope_params_yarn(
    device: torch.Device,
    cfg: ExLlamaV2Config,
    base: float,
) -> tuple[torch.Tensor, float]: ...

Import

from exllamav2.rope import get_rope_params

I/O Contract

get_rope_params()

Parameter Type Description
device torch.Device Target device for the returned inverse frequency tensor
cfg ExLlamaV2Config Model configuration containing head_dim, max_seq_len, alt_rope_method, scale_alpha_value, and variant-specific fields
base float Base frequency for computing inverse frequencies (typically 10000.0)
Return Type Description
inv_freq torch.Tensor Inverse frequency tensor of shape (head_dim // 2,) used for rotation
scaling_factor float Attention scaling factor (1.0 for no scaling)

Config Fields by Variant

Variant Config Fields Used
default head_dim, partial_rotary_factor, scale_alpha_value
su head_dim, partial_rotary_factor, scale_alpha_value, max_seq_len, original_max_seq_len, scale_long_factor, scale_short_factor
llama3 head_dim, partial_rotary_factor, scale_alpha_value, l3_rope_factor, l3_rope_low_freq_factor, l3_rope_high_freq_factor, l3_rope_original_max_position_embeddings
yarn head_dim, partial_rotary_factor, scale_alpha_value, max_seq_len, yarn_rope_original_max_position_embeddings, yarn_rope_factor

Usage Examples

from exllamav2.rope import get_rope_params

# Compute RoPE parameters for the model's configured variant
inv_freq, scaling_factor = get_rope_params(
    device=torch.device("cuda:0"),
    cfg=model.config,
    base=10000.0,
)

print(f"inv_freq shape: {inv_freq.shape}")      # e.g. (64,) for head_dim=128
print(f"scaling_factor: {scaling_factor}")       # e.g. 1.0

# The inv_freq tensor is used to compute rotation angles:
# t = torch.arange(seq_len, device=device)
# freqs = torch.outer(t, inv_freq)
# emb = torch.cat((freqs, freqs), dim=-1)
# cos, sin = emb.cos(), emb.sin()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment