Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:LMCache LMCache LMCacheEngine Store

From Leeroopedia


Knowledge Sources
Domains Caching, Memory_Management
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for storing KV cache tensors from GPU memory into multi-tier storage backends, provided by the LMCacheEngine class.

Description

The LMCacheEngine.store method is the primary API for persisting KV cache data. It processes input tokens through the token database to generate chunk keys, allocates memory objects from the storage manager, extracts KV tensors from GPU via the GPU connector, and dispatches the data to all configured storage backends. The method is decorated with @torch.inference_mode() for performance and includes health checks, freeze mode support, and statistics monitoring.

Usage

This method is called by the vLLM connector (LMCacheConnectorV1Impl) after each inference request to store the computed KV cache. It can also be called directly when using LMCache in standalone mode.

Code Reference

Source Location

  • Repository: LMCache
  • File: lmcache/v1/cache_engine.py
  • Lines: L333-L528

Signature

@torch.inference_mode()
def store(
    self,
    tokens: Optional[Union[torch.Tensor, list[int]]] = None,
    hashes: Optional[List[int]] = None,
    offsets: Optional[List[int]] = None,
    mask: Optional[torch.Tensor] = None,
    **kwargs,
) -> None:
    """Store the tokens/hashes and mask into the cache engine.

    Args:
        tokens: The tokens of the corresponding KV caches.
        hashes: The hashes of the corresponding KV caches.
        offsets: Chunk offsets when using hash-based lookup.
        mask: Boolean mask (FFFFFTTTTTTT format) where True = tokens to store.
        **kwargs: Additional args including paged KV buffer and page tables.
    """

Import

from lmcache.v1.cache_engine import LMCacheEngine, LMCacheEngineBuilder

I/O Contract

Inputs

Name Type Required Description
tokens Optional[Union[torch.Tensor, list[int]]] No* Token IDs for the request (* either tokens or hashes required)
hashes Optional[List[int]] No* Pre-computed chunk hashes
offsets Optional[List[int]] No Chunk offsets (required if hashes provided)
mask Optional[torch.Tensor] No Boolean mask indicating which tokens to store
**kwargs dict Yes Must include paged KV buffer, page tables, slot mappings from serving engine

Outputs

Name Type Description
(none) None KV cache chunks stored in configured backends (CPU/disk/remote)

Usage Examples

Standalone Store

import torch
from lmcache.v1.cache_engine import LMCacheEngineBuilder

# Assume engine is already created via LMCacheEngineBuilder.get_or_create()
engine = LMCacheEngineBuilder.get("lmcache")

# Store KV cache for a token sequence
tokens = torch.tensor([1, 2, 3, 4, ...], dtype=torch.long)
engine.store(
    tokens=tokens,
    kv_caches=kv_buffer,        # GPU paged KV buffer
    slot_mapping=slot_mapping,   # Page table mapping
)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment