Implementation:LMCache LMCache Remote Backend

Knowledge Sources	LMCache
Domains	Caching, Storage Backends, Remote Storage
Last Updated	2026-02-09 00:00 GMT

Overview

RemoteBackend is the high-level storage backend that orchestrates communication with remote KV cache storage services, managing connections, serialization, MLA mode handling, and asynchronous put/get operations through any RemoteConnector implementation.

Description

The RemoteBackend implements StorageBackendInterface and serves as the primary bridge between the LMCache engine and remote storage connectors (Redis, Valkey, filesystem, EIC, SageMaker HyperPod). It creates the appropriate RemoteConnector via the CreateConnector factory based on the configured remote_url, and wraps data with configurable serialization/deserialization (CreateSerde) for compression during remote transfers. The backend supports MLA (Multi-Layer Attention) worker ID aliasing mode, where non-zero workers transparently rewrite their keys to use worker ID 0 for shared cache access across tensor-parallel workers. Connection management includes automatic reconnection with a configurable cooldown interval (10 seconds default) and a blocking timeout for synchronous operations. Put operations are submitted asynchronously with reference counting and completion callbacks, while get operations block with configurable timeout. The backend supports batched variants of contains, get, put, and non-blocking prefetch operations, delegating to the connector's batched capabilities when available. Integrated with LMCache's observability system, it tracks metrics including remote get/put timing, failed get counts, and in-flight put task counts via Prometheus.

Usage

Use RemoteBackend whenever LMCache needs to store or retrieve KV cache data from any remote storage service. It is automatically instantiated by the LMCache engine when a remote_url is configured. The backend handles all complexity of connector selection, serialization, error recovery, and metrics reporting.

Code Reference

Source Location

Repository: LMCache
File: lmcache/v1/storage_backend/remote_backend.py
Lines: 1-592

Signature

class RemoteBackend(StorageBackendInterface):
    def __init__(
        self,
        config: LMCacheEngineConfig,
        metadata: LMCacheMetadata,
        loop: asyncio.AbstractEventLoop,
        local_cpu_backend: Optional[LocalCPUBackend],
        dst_device: str = "cuda",
    ): ...
    def init_connection(self): ...
    def contains(self, key: CacheEngineKey, pin: bool = False) -> bool: ...
    def batched_contains(self, keys: List[CacheEngineKey], pin: bool = False) -> int: ...
    def exists_in_put_tasks(self, key: CacheEngineKey) -> bool: ...
    def submit_put_task(
        self, key: CacheEngineKey, memory_obj: MemoryObj,
        on_complete_callback: Optional[Callable[[CacheEngineKey], None]] = None,
    ) -> Future: ...
    def batched_submit_put_task(
        self, keys: Sequence[CacheEngineKey], memory_objs: List[MemoryObj],
        transfer_spec: Any = None,
        on_complete_callback: Optional[Callable[[CacheEngineKey], None]] = None,
    ) -> None: ...
    def get_blocking(self, key: CacheEngineKey) -> Optional[MemoryObj]: ...
    def batched_get_blocking(self, keys: List[CacheEngineKey]) -> List[Optional[MemoryObj]]: ...
    async def support_batched_async_contains(self) -> bool: ...
    async def batched_async_contains(self, lookup_id: str, keys: list[CacheEngineKey], pin: bool = False) -> int: ...
    async def support_batched_get_non_blocking(self) -> bool: ...
    async def batched_get_non_blocking(self, lookup_id: str, keys: List[CacheEngineKey], transfer_spec: Any = None) -> List[MemoryObj]: ...
    def pin(self, key: CacheEngineKey) -> bool: ...
    def unpin(self, key: CacheEngineKey) -> bool: ...
    def remove(self, key, force=True): ...
    def get_allocator_backend(self): ...
    def close(self): ...

Import

from lmcache.v1.storage_backend.remote_backend import RemoteBackend

I/O Contract

Inputs

Name	Type	Required	Description
config	LMCacheEngineConfig	Yes	Engine configuration with remote_url, remote_serde, blocking_timeout_secs, and extra_config
metadata	LMCacheMetadata	Yes	LMCache metadata with tensor shapes, dtypes, chunk size, MLA mode, world size, and worker ID
loop	asyncio.AbstractEventLoop	Yes	Asyncio event loop for async connector operations
local_cpu_backend	Optional[LocalCPUBackend]	No	CPU backend for memory allocation (None when running in scheduler role)
dst_device	str	No	Target device string (default: "cuda")

Outputs

Name	Type	Description
RemoteBackend	StorageBackendInterface	A fully initialized remote storage backend with connector, serializer/deserializer, MLA mode handling, metrics, and reconnection support

Usage Examples

from lmcache.v1.storage_backend.remote_backend import RemoteBackend

# Initialize remote backend (typically done by the LMCache engine)
remote_backend = RemoteBackend(
    config=lmcache_config,
    metadata=lmcache_metadata,
    loop=asyncio.get_event_loop(),
    local_cpu_backend=local_cpu_backend,
    dst_device="cuda:0",
)

# Check if key exists remotely
if remote_backend.contains(cache_key):
    # Blocking get with deserialization
    memory_obj = remote_backend.get_blocking(cache_key)

# Async put with serialization and callback
future = remote_backend.submit_put_task(
    cache_key, memory_obj,
    on_complete_callback=lambda k: print(f"Stored {k}"),
)

# Batched operations
hit_count = remote_backend.batched_contains(keys)
results = remote_backend.batched_get_blocking(keys)
remote_backend.batched_submit_put_task(keys, memory_objs)

# Async non-blocking prefetch
prefetched = await remote_backend.batched_get_non_blocking("lookup-1", keys)

# Cleanup
remote_backend.close()

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment