Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:LLMBook zh LLMBook zh github io Apply Rotary Pos Emb

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Model_Architecture
Last Updated 2026-02-08 04:29 GMT

Overview

Concrete tool for applying Rotary Position Embeddings to query and key tensors provided as standalone functions.

Description

This implementation provides two functions: rotate_half and apply_rotary_pos_emb. The `rotate_half` helper splits a vector into two halves and swaps them with a sign flip, implementing the rotation operation. The `apply_rotary_pos_emb` function takes precomputed cosine and sine values indexed by position and applies them to the query and key tensors using the RoPE formula. These functions are used inside the attention mechanism of LLaMA-style models to inject position information into the attention computation.

Usage

Import these functions when implementing or studying the attention mechanism of LLaMA-family models. They are called after the Q/K projections and before the attention score computation. The cos/sin values are precomputed based on the maximum sequence length and hidden dimension.

Code Reference

Source Location

  • Repository: LLMBook-zh
  • File: code/5.2 RoPE.py
  • Lines: 1-14

Signature

def rotate_half(x):
    """
    Splits input tensor into two halves along last dimension,
    swaps them with negation on the first half.

    Args:
        x: Input tensor of shape (..., d).
    Returns:
        Rotated tensor of shape (..., d) where [-x2, x1] replaces [x1, x2].
    """

def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
    """
    Applies rotary position embeddings to query and key tensors.

    Args:
        q: Query tensor of shape (batch, heads, seq_len, head_dim).
        k: Key tensor of shape (batch, heads, seq_len, head_dim).
        cos: Cosine values of shape (max_seq_len, head_dim).
        sin: Sine values of shape (max_seq_len, head_dim).
        position_ids: Position indices of shape (batch, seq_len).
    Returns:
        Tuple of (q_embed, k_embed) with position information encoded.
    """

Import

import torch
# Functions defined locally in code/5.2 RoPE.py

I/O Contract

Inputs

Name Type Required Description
q torch.Tensor Yes Query tensor (batch, heads, seq_len, head_dim)
k torch.Tensor Yes Key tensor (batch, heads, seq_len, head_dim)
cos torch.Tensor Yes Precomputed cosine values (max_seq_len, head_dim)
sin torch.Tensor Yes Precomputed sine values (max_seq_len, head_dim)
position_ids torch.Tensor Yes Position indices (batch, seq_len)

Outputs

Name Type Description
q_embed torch.Tensor Query with rotary position encoding applied
k_embed torch.Tensor Key with rotary position encoding applied

Usage Examples

import torch

# Example: apply RoPE to query and key projections
batch, heads, seq_len, head_dim = 2, 32, 128, 64
q = torch.randn(batch, heads, seq_len, head_dim)
k = torch.randn(batch, heads, seq_len, head_dim)

# Precomputed cos/sin from frequency table
cos = torch.randn(512, head_dim)  # max_seq_len=512
sin = torch.randn(512, head_dim)
position_ids = torch.arange(seq_len).unsqueeze(0).expand(batch, -1)

q_embed, k_embed = apply_rotary_pos_emb(q, k, cos, sin, position_ids)
# q_embed.shape == (2, 32, 128, 64)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment