Principle:LMCache LMCache RoPE Position Recovery
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Attention_Mechanisms |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A position encoding correction mechanism that re-encodes cached KV tensors with correct positions when reusing them at different sequence positions via rotary position embeddings.
Description
RoPE Position Recovery addresses the core challenge of non-prefix KV cache reuse: when a cached KV tensor was computed at one sequence position but needs to be used at a different position, its rotary position embedding (RoPE) is incorrect. The recovery process: (1) reverse the old RoPE encoding, (2) apply the new RoPE encoding for the correct position. This can be fused into a single operation using the algebraic property of complex multiplication.
After position correction, the blender identifies the top-k most divergent positions (based on attention weight differences at check layers) and selectively recomputes those positions from scratch, leaving the rest as cached values with corrected positions.
Usage
This is the core algorithm in CacheBlend. It operates transparently within the LMCBlender.blend method during inference.
Theoretical Basis
RoPE encodes position as complex rotation:
Position recovery fuses reverse + re-encode:
The fused_encode operation computes this in a single CUDA kernel call, avoiding the overhead of separate reverse and forward passes.
Selective recomputation:
- At check layers, compute attention weights with both cached and fresh KV
- Identify top-k positions with highest divergence (recompute_ratio fraction)
- At subsequent layers, recompute only those positions from scratch
- Blend: use recomputed values for divergent positions, cached values for rest