Principle:LMCache LMCache Segment Based KV Caching

Knowledge Sources	LMCache CacheBlend
Domains	Caching, NLP
Last Updated	2026-02-09 00:00 GMT

Overview

A segment-aware token chunking strategy that splits token sequences on separator strings rather than fixed boundaries, enabling reuse of individual text segments regardless of position.

Description

Segment Based KV Caching replaces the standard fixed-size chunking with separator-based splitting. Input tokens are scanned for occurrences of the blend_special_str separator (e.g., " # # "), and each segment between separators becomes an independently-hashed cache chunk. This means the same text segment always produces the same cache key regardless of where it appears in the full prompt.

This enables the core CacheBlend capability: text segments from one request can be reused in a different request even if the segments appear in a different order.

Usage

Active when enable_blending=True. The SegmentTokenDatabase automatically handles separator-based splitting during both store and retrieve operations.

Theoretical Basis

Traditional prefix-based caching:

# Fixed chunks: [tok0..tok255], [tok256..tok511], ...
# Key depends on ALL preceding tokens (prefix chain)

Segment-based caching:

# Segments: [sys_prompt], [sep], [chunk1], [sep], [chunk2], ...
# Each segment hashed independently
# Same segment always = same hash, regardless of position

Related Pages

Implemented By

Implementation:LMCache_LMCache_SegmentTokenDatabase_Process_Tokens

Uses Heuristic

Heuristic:LMCache_LMCache_Prefix_Based_Retrieval_Pattern

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment