Principle:LMCache LMCache RoPE Position Recovery

Knowledge Sources	LMCache CacheBlend RoFormer
Domains	Deep_Learning, Attention_Mechanisms
Last Updated	2026-02-09 00:00 GMT

Overview

A position encoding correction mechanism that re-encodes cached KV tensors with correct positions when reusing them at different sequence positions via rotary position embeddings.

Description

RoPE Position Recovery addresses the core challenge of non-prefix KV cache reuse: when a cached KV tensor was computed at one sequence position but needs to be used at a different position, its rotary position embedding (RoPE) is incorrect. The recovery process: (1) reverse the old RoPE encoding, (2) apply the new RoPE encoding for the correct position. This can be fused into a single operation using the algebraic property of complex multiplication.

After position correction, the blender identifies the top-k most divergent positions (based on attention weight differences at check layers) and selectively recomputes those positions from scratch, leaving the rest as cached values with corrected positions.

Usage

This is the core algorithm in CacheBlend. It operates transparently within the LMCBlender.blend method during inference.

Theoretical Basis

RoPE encodes position as complex rotation: $R o P E (x, p o s) = x \cdot e^{i \cdot p o s \cdot θ}$

Position recovery fuses reverse + re-encode: $K_{n e w} = K_{o l d} \cdot e^{i \cdot (p o s_{n e w} - p o s_{o l d}) \cdot θ}$

The fused_encode operation computes this in a single CUDA kernel call, avoiding the overhead of separate reverse and forward passes.

Selective recomputation:

At check layers, compute attention weights with both cached and fresh KV
Identify top-k positions with highest divergence (recompute_ratio fraction)
At subsequent layers, recompute only those positions from scratch
Blend: use recomputed values for divergent positions, cached values for rest

Related Pages

Implemented By

Implementation:LMCache_LMCache_LMCBlender_Blend

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment