Principle:LaurentMazare Tch rs Rotary Position Embedding

Knowledge Sources	RoFormer: Enhanced Transformer with Rotary Position Embedding tch-rs
Domains	NLP, Positional_Encoding
Last Updated	2026-02-08 14:00 GMT

Overview

Position encoding method that rotates query and key vectors in the attention mechanism using position-dependent rotation matrices derived from sinusoidal frequencies.

Description

Rotary Position Embedding (RoPE) encodes positional information by rotating pairs of dimensions in the query and key vectors using position-dependent angles. Unlike absolute position embeddings (which are added), RoPE applies a rotation that naturally decays attention with distance. The rotation frequencies follow a geometric sequence: theta_i = 1 / 10000^(2i/d), where i indexes dimension pairs. Cosine and sine values are precomputed for all positions in the context window.

Usage

Use in transformer attention layers where relative position awareness is needed. Precompute the frequency tensor once and pass it to each attention layer during forward passes.

Theoretical Basis

For each pair of dimensions (2i, 2i+1): $θ_{i} = \frac{1}{1000 0^{2 i / d}}$

$(\begin{matrix} q'_{2 i} \\ q'_{2 i + 1} \end{matrix}) = (\begin{matrix} \cos (m θ_{i}) & - \sin (m θ_{i}) \\ \sin (m θ_{i}) & \cos (m θ_{i}) \end{matrix}) (\begin{matrix} q_{2 i} \\ q_{2 i + 1} \end{matrix})$

Where m is the position index.

Related Pages

Implemented By

Implementation:LaurentMazare_Tch_rs_Precompute_Freqs_Cis

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment