Implementation:LMCache LMCache Remote Backend
| Knowledge Sources | |
|---|---|
| Domains | Caching, Storage Backends, Remote Storage |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
RemoteBackend is the high-level storage backend that orchestrates communication with remote KV cache storage services, managing connections, serialization, MLA mode handling, and asynchronous put/get operations through any RemoteConnector implementation.
Description
The RemoteBackend implements StorageBackendInterface and serves as the primary bridge between the LMCache engine and remote storage connectors (Redis, Valkey, filesystem, EIC, SageMaker HyperPod). It creates the appropriate RemoteConnector via the CreateConnector factory based on the configured remote_url, and wraps data with configurable serialization/deserialization (CreateSerde) for compression during remote transfers. The backend supports MLA (Multi-Layer Attention) worker ID aliasing mode, where non-zero workers transparently rewrite their keys to use worker ID 0 for shared cache access across tensor-parallel workers. Connection management includes automatic reconnection with a configurable cooldown interval (10 seconds default) and a blocking timeout for synchronous operations. Put operations are submitted asynchronously with reference counting and completion callbacks, while get operations block with configurable timeout. The backend supports batched variants of contains, get, put, and non-blocking prefetch operations, delegating to the connector's batched capabilities when available. Integrated with LMCache's observability system, it tracks metrics including remote get/put timing, failed get counts, and in-flight put task counts via Prometheus.
Usage
Use RemoteBackend whenever LMCache needs to store or retrieve KV cache data from any remote storage service. It is automatically instantiated by the LMCache engine when a remote_url is configured. The backend handles all complexity of connector selection, serialization, error recovery, and metrics reporting.
Code Reference
Source Location
- Repository: LMCache
- File: lmcache/v1/storage_backend/remote_backend.py
- Lines: 1-592
Signature
class RemoteBackend(StorageBackendInterface):
def __init__(
self,
config: LMCacheEngineConfig,
metadata: LMCacheMetadata,
loop: asyncio.AbstractEventLoop,
local_cpu_backend: Optional[LocalCPUBackend],
dst_device: str = "cuda",
): ...
def init_connection(self): ...
def contains(self, key: CacheEngineKey, pin: bool = False) -> bool: ...
def batched_contains(self, keys: List[CacheEngineKey], pin: bool = False) -> int: ...
def exists_in_put_tasks(self, key: CacheEngineKey) -> bool: ...
def submit_put_task(
self, key: CacheEngineKey, memory_obj: MemoryObj,
on_complete_callback: Optional[Callable[[CacheEngineKey], None]] = None,
) -> Future: ...
def batched_submit_put_task(
self, keys: Sequence[CacheEngineKey], memory_objs: List[MemoryObj],
transfer_spec: Any = None,
on_complete_callback: Optional[Callable[[CacheEngineKey], None]] = None,
) -> None: ...
def get_blocking(self, key: CacheEngineKey) -> Optional[MemoryObj]: ...
def batched_get_blocking(self, keys: List[CacheEngineKey]) -> List[Optional[MemoryObj]]: ...
async def support_batched_async_contains(self) -> bool: ...
async def batched_async_contains(self, lookup_id: str, keys: list[CacheEngineKey], pin: bool = False) -> int: ...
async def support_batched_get_non_blocking(self) -> bool: ...
async def batched_get_non_blocking(self, lookup_id: str, keys: List[CacheEngineKey], transfer_spec: Any = None) -> List[MemoryObj]: ...
def pin(self, key: CacheEngineKey) -> bool: ...
def unpin(self, key: CacheEngineKey) -> bool: ...
def remove(self, key, force=True): ...
def get_allocator_backend(self): ...
def close(self): ...
Import
from lmcache.v1.storage_backend.remote_backend import RemoteBackend
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | LMCacheEngineConfig | Yes | Engine configuration with remote_url, remote_serde, blocking_timeout_secs, and extra_config |
| metadata | LMCacheMetadata | Yes | LMCache metadata with tensor shapes, dtypes, chunk size, MLA mode, world size, and worker ID |
| loop | asyncio.AbstractEventLoop | Yes | Asyncio event loop for async connector operations |
| local_cpu_backend | Optional[LocalCPUBackend] | No | CPU backend for memory allocation (None when running in scheduler role) |
| dst_device | str | No | Target device string (default: "cuda") |
Outputs
| Name | Type | Description |
|---|---|---|
| RemoteBackend | StorageBackendInterface | A fully initialized remote storage backend with connector, serializer/deserializer, MLA mode handling, metrics, and reconnection support |
Usage Examples
from lmcache.v1.storage_backend.remote_backend import RemoteBackend
# Initialize remote backend (typically done by the LMCache engine)
remote_backend = RemoteBackend(
config=lmcache_config,
metadata=lmcache_metadata,
loop=asyncio.get_event_loop(),
local_cpu_backend=local_cpu_backend,
dst_device="cuda:0",
)
# Check if key exists remotely
if remote_backend.contains(cache_key):
# Blocking get with deserialization
memory_obj = remote_backend.get_blocking(cache_key)
# Async put with serialization and callback
future = remote_backend.submit_put_task(
cache_key, memory_obj,
on_complete_callback=lambda k: print(f"Stored {k}"),
)
# Batched operations
hit_count = remote_backend.batched_contains(keys)
results = remote_backend.batched_get_blocking(keys)
remote_backend.batched_submit_put_task(keys, memory_objs)
# Async non-blocking prefetch
prefetched = await remote_backend.batched_get_non_blocking("lookup-1", keys)
# Cleanup
remote_backend.close()