Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:LaurentMazare Tch rs Rotary Position Embedding

From Leeroopedia


Knowledge Sources
Domains NLP, Positional_Encoding
Last Updated 2026-02-08 14:00 GMT

Overview

Position encoding method that rotates query and key vectors in the attention mechanism using position-dependent rotation matrices derived from sinusoidal frequencies.

Description

Rotary Position Embedding (RoPE) encodes positional information by rotating pairs of dimensions in the query and key vectors using position-dependent angles. Unlike absolute position embeddings (which are added), RoPE applies a rotation that naturally decays attention with distance. The rotation frequencies follow a geometric sequence: theta_i = 1 / 10000^(2i/d), where i indexes dimension pairs. Cosine and sine values are precomputed for all positions in the context window.

Usage

Use in transformer attention layers where relative position awareness is needed. Precompute the frequency tensor once and pass it to each attention layer during forward passes.

Theoretical Basis

For each pair of dimensions (2i, 2i+1): θi=1100002i/d

(q'2iq'2i+1)=(cos(mθi)sin(mθi)sin(mθi)cos(mθi))(q2iq2i+1)

Where m is the position index.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment