Principle:LMCache LMCache Segment Based KV Caching
| Knowledge Sources | |
|---|---|
| Domains | Caching, NLP |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A segment-aware token chunking strategy that splits token sequences on separator strings rather than fixed boundaries, enabling reuse of individual text segments regardless of position.
Description
Segment Based KV Caching replaces the standard fixed-size chunking with separator-based splitting. Input tokens are scanned for occurrences of the blend_special_str separator (e.g., " # # "), and each segment between separators becomes an independently-hashed cache chunk. This means the same text segment always produces the same cache key regardless of where it appears in the full prompt.
This enables the core CacheBlend capability: text segments from one request can be reused in a different request even if the segments appear in a different order.
Usage
Active when enable_blending=True. The SegmentTokenDatabase automatically handles separator-based splitting during both store and retrieve operations.
Theoretical Basis
Traditional prefix-based caching:
# Fixed chunks: [tok0..tok255], [tok256..tok511], ...
# Key depends on ALL preceding tokens (prefix chain)
Segment-based caching:
# Segments: [sys_prompt], [sep], [chunk1], [sep], [chunk2], ...
# Each segment hashed independently
# Same segment always = same hash, regardless of position