Principle:LLMBook zh LLMBook zh github io Rotary Position Embedding
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Model_Architecture |
| Last Updated | 2026-02-08 04:29 GMT |
Overview
Position encoding mechanism that injects positional information by rotating query and key vectors in 2D subspaces, enabling relative position awareness through dot-product geometry.
Description
Rotary Position Embedding (RoPE) encodes absolute position information into query and key vectors by applying rotation matrices. Each pair of adjacent dimensions in the embedding is treated as a 2D subspace and rotated by an angle proportional to the position index. The key property is that the dot product between two rotated vectors depends only on their relative distance, naturally encoding relative position information through absolute position encoding. RoPE is the position encoding method used in LLaMA and most modern LLMs, replacing learned absolute embeddings and sinusoidal encodings.
Usage
Use this principle when designing or understanding position-aware attention mechanisms in Transformer models. RoPE is applied to the query and key projections before computing attention scores. It is the standard position encoding for LLaMA, Mistral, Qwen, and other modern decoder-only architectures.
Theoretical Basis
RoPE applies a rotation to each 2D subspace of the query and key vectors:
Where:
- is the position index
- is the rotation frequency for the -th subspace
- The dot product depends only on
Pseudo-code Logic:
# Abstract algorithm description (NOT real implementation)
x1, x2 = x[..., :d//2], x[..., d//2:]
rotated = concat(-x2, x1) # rotate_half
q_embed = q * cos + rotate_half(q) * sin
k_embed = k * cos + rotate_half(k) * sin