Implementation:LMCache LMCache LMCacheEngine Store

Knowledge Sources	LMCache
Domains	Caching, Memory_Management
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for storing KV cache tensors from GPU memory into multi-tier storage backends, provided by the LMCacheEngine class.

Description

The LMCacheEngine.store method is the primary API for persisting KV cache data. It processes input tokens through the token database to generate chunk keys, allocates memory objects from the storage manager, extracts KV tensors from GPU via the GPU connector, and dispatches the data to all configured storage backends. The method is decorated with @torch.inference_mode() for performance and includes health checks, freeze mode support, and statistics monitoring.

Usage

This method is called by the vLLM connector (LMCacheConnectorV1Impl) after each inference request to store the computed KV cache. It can also be called directly when using LMCache in standalone mode.

Code Reference

Source Location

Repository: LMCache
File: lmcache/v1/cache_engine.py
Lines: L333-L528

Signature

@torch.inference_mode()
def store(
    self,
    tokens: Optional[Union[torch.Tensor, list[int]]] = None,
    hashes: Optional[List[int]] = None,
    offsets: Optional[List[int]] = None,
    mask: Optional[torch.Tensor] = None,
    **kwargs,
) -> None:
    """Store the tokens/hashes and mask into the cache engine.

    Args:
        tokens: The tokens of the corresponding KV caches.
        hashes: The hashes of the corresponding KV caches.
        offsets: Chunk offsets when using hash-based lookup.
        mask: Boolean mask (FFFFFTTTTTTT format) where True = tokens to store.
        **kwargs: Additional args including paged KV buffer and page tables.
    """

Import

from lmcache.v1.cache_engine import LMCacheEngine, LMCacheEngineBuilder

I/O Contract

Inputs

Name	Type	Required	Description
tokens	Optional[Union[torch.Tensor, list[int]]]	No*	Token IDs for the request (* either tokens or hashes required)
hashes	Optional[List[int]]	No*	Pre-computed chunk hashes
offsets	Optional[List[int]]	No	Chunk offsets (required if hashes provided)
mask	Optional[torch.Tensor]	No	Boolean mask indicating which tokens to store
**kwargs	dict	Yes	Must include paged KV buffer, page tables, slot mappings from serving engine

Outputs

Name	Type	Description
(none)	None	KV cache chunks stored in configured backends (CPU/disk/remote)

Usage Examples

Standalone Store

import torch
from lmcache.v1.cache_engine import LMCacheEngineBuilder

# Assume engine is already created via LMCacheEngineBuilder.get_or_create()
engine = LMCacheEngineBuilder.get("lmcache")

# Store KV cache for a token sequence
tokens = torch.tensor([1, 2, 3, 4, ...], dtype=torch.long)
engine.store(
    tokens=tokens,
    kv_caches=kv_buffer,        # GPU paged KV buffer
    slot_mapping=slot_mapping,   # Page table mapping
)

Related Pages

Implements Principle

Principle:LMCache_LMCache_KV_Cache_Storage

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment