Implementation:InternLM Lmdeploy RotaryEmbedding
Appearance
| Knowledge Sources | |
|---|---|
| Domains | GPU_Kernels, Attention |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Device-side rotary positional embedding (RoPE) implementations supporting default, linear, dynamic, YaRN, Llama3, and multi-rope (M-RoPE) variants, plus log-n attention scaling.
Description
This header provides multiple RoPE implementations for use within attention kernels:
- FastRoPE<N>: The primary RoPE class used by
AttentionUniversal::Prologue. Supports all RoPE variants via a runtimetypeswitch. Theinitmethod precomputes inverse frequencies based on the variant (default, YaRN with linear ramp interpolation, Llama3 with smooth interpolation), andapplyrotates a vector fragment using sincos. For M-RoPE (multi-modal),apply_mropelooks up per-dimension position IDs from a table. - RotaryEmbedding<N>: A simpler alternative that precomputes cos/sin coefficients from base frequency and dimension index. Uses
__noinline__on the coefficient computation to ensure consistent results. - RoPE<N, C>: A minimal variant that stores precomputed inverse frequencies and applies rotation.
- LogNScaling: Implements log-n attention scaling that scales Q by
log2(seq_len) / log2(max_position_embeddings)when sequence length exceeds the training length. - ApplyRotaryEmbedding: A standalone device function for applying RoPE to 4-element vectors.
Usage
FastRoPE is instantiated per-thread in the attention kernel prologue to apply positional encoding to Q and K vectors before they enter the attention mainloop.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/kernels/attention/rotary_embedding.h
- Lines: 1-282
Signature
namespace turbomind {
template<int N>
struct FastRoPE {
__device__ FastRoPE(const RopeKernelParam& param, int batch_idx, std::integral_constant<int, N>);
__device__ void init(int idx);
template<typename T>
__device__ void apply(Array<T, N>& x, float timestep);
template<typename T>
__device__ void apply_mrope(Array<T, N>& x, float timestep);
};
template<int N>
struct RotaryEmbedding {
__device__ RotaryEmbedding(float base, int dims, int timestep, int2 offset);
static __device__ __noinline__ float2 get_coefficient(int idx, int dims, float base, int timestep);
template<typename T>
__device__ void apply(Array<T, N>& x);
};
template<int N, int C = 8>
struct RoPE {
__device__ RoPE(float idx, float base, float dims);
template<class T>
__device__ void apply(Array<T, N * 2>& x, float timestep);
};
struct LogNScaling {
__device__ LogNScaling(int seq_len, int max_position_embeddings);
template<typename T, int N>
__device__ void apply(Array<T, N>& x) const;
};
template<class C, class T>
__device__ void ApplyRotaryEmbedding(Array<T, 4>& x, float base, int dims, int ti, int di);
} // namespace turbomind
Import
#include "src/turbomind/kernels/attention/rotary_embedding.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| param | RopeKernelParam | Yes | RoPE configuration (type, base, dim, scaling factors) |
| batch_idx | int | Yes | Batch index (for dynamic base or M-RoPE position lookup) |
| idx | int | Yes | Dimension index within the head |
| timestep | float | Yes | Token position for rotation angle computation |
| x | Array<T, N>& | Yes | Vector fragment to rotate (modified in-place) |
Outputs
| Name | Type | Description |
|---|---|---|
| x (modified) | Array<T, N>& | Rotated vector fragment |
Usage Examples
// In attention prologue:
FastRoPE rope(params.rope_param, batch_idx, std::integral_constant<int, 8>{});
rope.init(dim_offset);
rope.apply(vec_Q, timestep);
rope.apply(vec_K, timestep);
// Log-n scaling:
LogNScaling logn(seq_pos, max_pos_embeddings);
logn.apply(vec_Q);
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment