Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:LMCache LMCache RoPE Position Recovery

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Attention_Mechanisms
Last Updated 2026-02-09 00:00 GMT

Overview

A position encoding correction mechanism that re-encodes cached KV tensors with correct positions when reusing them at different sequence positions via rotary position embeddings.

Description

RoPE Position Recovery addresses the core challenge of non-prefix KV cache reuse: when a cached KV tensor was computed at one sequence position but needs to be used at a different position, its rotary position embedding (RoPE) is incorrect. The recovery process: (1) reverse the old RoPE encoding, (2) apply the new RoPE encoding for the correct position. This can be fused into a single operation using the algebraic property of complex multiplication.

After position correction, the blender identifies the top-k most divergent positions (based on attention weight differences at check layers) and selectively recomputes those positions from scratch, leaving the rest as cached values with corrected positions.

Usage

This is the core algorithm in CacheBlend. It operates transparently within the LMCBlender.blend method during inference.

Theoretical Basis

RoPE encodes position as complex rotation: RoPE(x,pos)=xeiposθ

Position recovery fuses reverse + re-encode: Knew=Koldei(posnewposold)θ

The fused_encode operation computes this in a single CUDA kernel call, avoiding the overhead of separate reverse and forward passes.

Selective recomputation:

  1. At check layers, compute attention weights with both cached and fresh KV
  2. Identify top-k positions with highest divergence (recompute_ratio fraction)
  3. At subsequent layers, recompute only those positions from scratch
  4. Blend: use recomputed values for divergent positions, cached values for rest

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment