Implementation:LLMBook zh LLMBook zh github io Apply Rotary Pos Emb
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Model_Architecture |
| Last Updated | 2026-02-08 04:29 GMT |
Overview
Concrete tool for applying Rotary Position Embeddings to query and key tensors provided as standalone functions.
Description
This implementation provides two functions: rotate_half and apply_rotary_pos_emb. The `rotate_half` helper splits a vector into two halves and swaps them with a sign flip, implementing the rotation operation. The `apply_rotary_pos_emb` function takes precomputed cosine and sine values indexed by position and applies them to the query and key tensors using the RoPE formula. These functions are used inside the attention mechanism of LLaMA-style models to inject position information into the attention computation.
Usage
Import these functions when implementing or studying the attention mechanism of LLaMA-family models. They are called after the Q/K projections and before the attention score computation. The cos/sin values are precomputed based on the maximum sequence length and hidden dimension.
Code Reference
Source Location
- Repository: LLMBook-zh
- File: code/5.2 RoPE.py
- Lines: 1-14
Signature
def rotate_half(x):
"""
Splits input tensor into two halves along last dimension,
swaps them with negation on the first half.
Args:
x: Input tensor of shape (..., d).
Returns:
Rotated tensor of shape (..., d) where [-x2, x1] replaces [x1, x2].
"""
def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
"""
Applies rotary position embeddings to query and key tensors.
Args:
q: Query tensor of shape (batch, heads, seq_len, head_dim).
k: Key tensor of shape (batch, heads, seq_len, head_dim).
cos: Cosine values of shape (max_seq_len, head_dim).
sin: Sine values of shape (max_seq_len, head_dim).
position_ids: Position indices of shape (batch, seq_len).
Returns:
Tuple of (q_embed, k_embed) with position information encoded.
"""
Import
import torch
# Functions defined locally in code/5.2 RoPE.py
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| q | torch.Tensor | Yes | Query tensor (batch, heads, seq_len, head_dim) |
| k | torch.Tensor | Yes | Key tensor (batch, heads, seq_len, head_dim) |
| cos | torch.Tensor | Yes | Precomputed cosine values (max_seq_len, head_dim) |
| sin | torch.Tensor | Yes | Precomputed sine values (max_seq_len, head_dim) |
| position_ids | torch.Tensor | Yes | Position indices (batch, seq_len) |
Outputs
| Name | Type | Description |
|---|---|---|
| q_embed | torch.Tensor | Query with rotary position encoding applied |
| k_embed | torch.Tensor | Key with rotary position encoding applied |
Usage Examples
import torch
# Example: apply RoPE to query and key projections
batch, heads, seq_len, head_dim = 2, 32, 128, 64
q = torch.randn(batch, heads, seq_len, head_dim)
k = torch.randn(batch, heads, seq_len, head_dim)
# Precomputed cos/sin from frequency table
cos = torch.randn(512, head_dim) # max_seq_len=512
sin = torch.randn(512, head_dim)
position_ids = torch.arange(seq_len).unsqueeze(0).expand(batch, -1)
q_embed, k_embed = apply_rotary_pos_emb(q, k, cos, sin, position_ids)
# q_embed.shape == (2, 32, 128, 64)