Implementation:LMCache LMCache LMCacheEngine Store
| Knowledge Sources | |
|---|---|
| Domains | Caching, Memory_Management |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for storing KV cache tensors from GPU memory into multi-tier storage backends, provided by the LMCacheEngine class.
Description
The LMCacheEngine.store method is the primary API for persisting KV cache data. It processes input tokens through the token database to generate chunk keys, allocates memory objects from the storage manager, extracts KV tensors from GPU via the GPU connector, and dispatches the data to all configured storage backends. The method is decorated with @torch.inference_mode() for performance and includes health checks, freeze mode support, and statistics monitoring.
Usage
This method is called by the vLLM connector (LMCacheConnectorV1Impl) after each inference request to store the computed KV cache. It can also be called directly when using LMCache in standalone mode.
Code Reference
Source Location
- Repository: LMCache
- File: lmcache/v1/cache_engine.py
- Lines: L333-L528
Signature
@torch.inference_mode()
def store(
self,
tokens: Optional[Union[torch.Tensor, list[int]]] = None,
hashes: Optional[List[int]] = None,
offsets: Optional[List[int]] = None,
mask: Optional[torch.Tensor] = None,
**kwargs,
) -> None:
"""Store the tokens/hashes and mask into the cache engine.
Args:
tokens: The tokens of the corresponding KV caches.
hashes: The hashes of the corresponding KV caches.
offsets: Chunk offsets when using hash-based lookup.
mask: Boolean mask (FFFFFTTTTTTT format) where True = tokens to store.
**kwargs: Additional args including paged KV buffer and page tables.
"""
Import
from lmcache.v1.cache_engine import LMCacheEngine, LMCacheEngineBuilder
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| tokens | Optional[Union[torch.Tensor, list[int]]] | No* | Token IDs for the request (* either tokens or hashes required) |
| hashes | Optional[List[int]] | No* | Pre-computed chunk hashes |
| offsets | Optional[List[int]] | No | Chunk offsets (required if hashes provided) |
| mask | Optional[torch.Tensor] | No | Boolean mask indicating which tokens to store |
| **kwargs | dict | Yes | Must include paged KV buffer, page tables, slot mappings from serving engine |
Outputs
| Name | Type | Description |
|---|---|---|
| (none) | None | KV cache chunks stored in configured backends (CPU/disk/remote) |
Usage Examples
Standalone Store
import torch
from lmcache.v1.cache_engine import LMCacheEngineBuilder
# Assume engine is already created via LMCacheEngineBuilder.get_or_create()
engine = LMCacheEngineBuilder.get("lmcache")
# Store KV cache for a token sequence
tokens = torch.tensor([1, 2, 3, 4, ...], dtype=torch.long)
engine.store(
tokens=tokens,
kv_caches=kv_buffer, # GPU paged KV buffer
slot_mapping=slot_mapping, # Page table mapping
)