Principle:LaurentMazare Tch rs Rotary Position Embedding
| Knowledge Sources | |
|---|---|
| Domains | NLP, Positional_Encoding |
| Last Updated | 2026-02-08 14:00 GMT |
Overview
Position encoding method that rotates query and key vectors in the attention mechanism using position-dependent rotation matrices derived from sinusoidal frequencies.
Description
Rotary Position Embedding (RoPE) encodes positional information by rotating pairs of dimensions in the query and key vectors using position-dependent angles. Unlike absolute position embeddings (which are added), RoPE applies a rotation that naturally decays attention with distance. The rotation frequencies follow a geometric sequence: theta_i = 1 / 10000^(2i/d), where i indexes dimension pairs. Cosine and sine values are precomputed for all positions in the context window.
Usage
Use in transformer attention layers where relative position awareness is needed. Precompute the frequency tensor once and pass it to each attention layer during forward passes.
Theoretical Basis
For each pair of dimensions (2i, 2i+1):
Where m is the position index.