Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy RotaryEmbedding

From Leeroopedia
Revision as of 15:15, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/InternLM_Lmdeploy_RotaryEmbedding.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains GPU_Kernels, Attention
Last Updated 2026-02-07 15:00 GMT

Overview

Device-side rotary positional embedding (RoPE) implementations supporting default, linear, dynamic, YaRN, Llama3, and multi-rope (M-RoPE) variants, plus log-n attention scaling.

Description

This header provides multiple RoPE implementations for use within attention kernels:

  • FastRoPE<N>: The primary RoPE class used by AttentionUniversal::Prologue. Supports all RoPE variants via a runtime type switch. The init method precomputes inverse frequencies based on the variant (default, YaRN with linear ramp interpolation, Llama3 with smooth interpolation), and apply rotates a vector fragment using sincos. For M-RoPE (multi-modal), apply_mrope looks up per-dimension position IDs from a table.
  • RotaryEmbedding<N>: A simpler alternative that precomputes cos/sin coefficients from base frequency and dimension index. Uses __noinline__ on the coefficient computation to ensure consistent results.
  • RoPE<N, C>: A minimal variant that stores precomputed inverse frequencies and applies rotation.
  • LogNScaling: Implements log-n attention scaling that scales Q by log2(seq_len) / log2(max_position_embeddings) when sequence length exceeds the training length.
  • ApplyRotaryEmbedding: A standalone device function for applying RoPE to 4-element vectors.

Usage

FastRoPE is instantiated per-thread in the attention kernel prologue to apply positional encoding to Q and K vectors before they enter the attention mainloop.

Code Reference

Source Location

Signature

namespace turbomind {

template<int N>
struct FastRoPE {
    __device__ FastRoPE(const RopeKernelParam& param, int batch_idx, std::integral_constant<int, N>);
    __device__ void init(int idx);
    template<typename T>
    __device__ void apply(Array<T, N>& x, float timestep);
    template<typename T>
    __device__ void apply_mrope(Array<T, N>& x, float timestep);
};

template<int N>
struct RotaryEmbedding {
    __device__ RotaryEmbedding(float base, int dims, int timestep, int2 offset);
    static __device__ __noinline__ float2 get_coefficient(int idx, int dims, float base, int timestep);
    template<typename T>
    __device__ void apply(Array<T, N>& x);
};

template<int N, int C = 8>
struct RoPE {
    __device__ RoPE(float idx, float base, float dims);
    template<class T>
    __device__ void apply(Array<T, N * 2>& x, float timestep);
};

struct LogNScaling {
    __device__ LogNScaling(int seq_len, int max_position_embeddings);
    template<typename T, int N>
    __device__ void apply(Array<T, N>& x) const;
};

template<class C, class T>
__device__ void ApplyRotaryEmbedding(Array<T, 4>& x, float base, int dims, int ti, int di);

} // namespace turbomind

Import

#include "src/turbomind/kernels/attention/rotary_embedding.h"

I/O Contract

Inputs

Name Type Required Description
param RopeKernelParam Yes RoPE configuration (type, base, dim, scaling factors)
batch_idx int Yes Batch index (for dynamic base or M-RoPE position lookup)
idx int Yes Dimension index within the head
timestep float Yes Token position for rotation angle computation
x Array<T, N>& Yes Vector fragment to rotate (modified in-place)

Outputs

Name Type Description
x (modified) Array<T, N>& Rotated vector fragment

Usage Examples

// In attention prologue:
FastRoPE rope(params.rope_param, batch_idx, std::integral_constant<int, 8>{});
rope.init(dim_offset);
rope.apply(vec_Q, timestep);
rope.apply(vec_K, timestep);

// Log-n scaling:
LogNScaling logn(seq_pos, max_pos_embeddings);
logn.apply(vec_Q);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment