Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:LMCache LMCache Segment Based KV Caching

From Leeroopedia


Knowledge Sources
Domains Caching, NLP
Last Updated 2026-02-09 00:00 GMT

Overview

A segment-aware token chunking strategy that splits token sequences on separator strings rather than fixed boundaries, enabling reuse of individual text segments regardless of position.

Description

Segment Based KV Caching replaces the standard fixed-size chunking with separator-based splitting. Input tokens are scanned for occurrences of the blend_special_str separator (e.g., " # # "), and each segment between separators becomes an independently-hashed cache chunk. This means the same text segment always produces the same cache key regardless of where it appears in the full prompt.

This enables the core CacheBlend capability: text segments from one request can be reused in a different request even if the segments appear in a different order.

Usage

Active when enable_blending=True. The SegmentTokenDatabase automatically handles separator-based splitting during both store and retrieve operations.

Theoretical Basis

Traditional prefix-based caching:

# Fixed chunks: [tok0..tok255], [tok256..tok511], ...
# Key depends on ALL preceding tokens (prefix chain)

Segment-based caching:

# Segments: [sys_prompt], [sep], [chunk1], [sep], [chunk2], ...
# Each segment hashed independently
# Same segment always = same hash, regardless of position

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment