Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:LMCache LMCache Remote Backend

From Leeroopedia
Revision as of 15:25, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/LMCache_LMCache_Remote_Backend.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Caching, Storage Backends, Remote Storage
Last Updated 2026-02-09 00:00 GMT

Overview

RemoteBackend is the high-level storage backend that orchestrates communication with remote KV cache storage services, managing connections, serialization, MLA mode handling, and asynchronous put/get operations through any RemoteConnector implementation.

Description

The RemoteBackend implements StorageBackendInterface and serves as the primary bridge between the LMCache engine and remote storage connectors (Redis, Valkey, filesystem, EIC, SageMaker HyperPod). It creates the appropriate RemoteConnector via the CreateConnector factory based on the configured remote_url, and wraps data with configurable serialization/deserialization (CreateSerde) for compression during remote transfers. The backend supports MLA (Multi-Layer Attention) worker ID aliasing mode, where non-zero workers transparently rewrite their keys to use worker ID 0 for shared cache access across tensor-parallel workers. Connection management includes automatic reconnection with a configurable cooldown interval (10 seconds default) and a blocking timeout for synchronous operations. Put operations are submitted asynchronously with reference counting and completion callbacks, while get operations block with configurable timeout. The backend supports batched variants of contains, get, put, and non-blocking prefetch operations, delegating to the connector's batched capabilities when available. Integrated with LMCache's observability system, it tracks metrics including remote get/put timing, failed get counts, and in-flight put task counts via Prometheus.

Usage

Use RemoteBackend whenever LMCache needs to store or retrieve KV cache data from any remote storage service. It is automatically instantiated by the LMCache engine when a remote_url is configured. The backend handles all complexity of connector selection, serialization, error recovery, and metrics reporting.

Code Reference

Source Location

Signature

class RemoteBackend(StorageBackendInterface):
    def __init__(
        self,
        config: LMCacheEngineConfig,
        metadata: LMCacheMetadata,
        loop: asyncio.AbstractEventLoop,
        local_cpu_backend: Optional[LocalCPUBackend],
        dst_device: str = "cuda",
    ): ...
    def init_connection(self): ...
    def contains(self, key: CacheEngineKey, pin: bool = False) -> bool: ...
    def batched_contains(self, keys: List[CacheEngineKey], pin: bool = False) -> int: ...
    def exists_in_put_tasks(self, key: CacheEngineKey) -> bool: ...
    def submit_put_task(
        self, key: CacheEngineKey, memory_obj: MemoryObj,
        on_complete_callback: Optional[Callable[[CacheEngineKey], None]] = None,
    ) -> Future: ...
    def batched_submit_put_task(
        self, keys: Sequence[CacheEngineKey], memory_objs: List[MemoryObj],
        transfer_spec: Any = None,
        on_complete_callback: Optional[Callable[[CacheEngineKey], None]] = None,
    ) -> None: ...
    def get_blocking(self, key: CacheEngineKey) -> Optional[MemoryObj]: ...
    def batched_get_blocking(self, keys: List[CacheEngineKey]) -> List[Optional[MemoryObj]]: ...
    async def support_batched_async_contains(self) -> bool: ...
    async def batched_async_contains(self, lookup_id: str, keys: list[CacheEngineKey], pin: bool = False) -> int: ...
    async def support_batched_get_non_blocking(self) -> bool: ...
    async def batched_get_non_blocking(self, lookup_id: str, keys: List[CacheEngineKey], transfer_spec: Any = None) -> List[MemoryObj]: ...
    def pin(self, key: CacheEngineKey) -> bool: ...
    def unpin(self, key: CacheEngineKey) -> bool: ...
    def remove(self, key, force=True): ...
    def get_allocator_backend(self): ...
    def close(self): ...

Import

from lmcache.v1.storage_backend.remote_backend import RemoteBackend

I/O Contract

Inputs

Name Type Required Description
config LMCacheEngineConfig Yes Engine configuration with remote_url, remote_serde, blocking_timeout_secs, and extra_config
metadata LMCacheMetadata Yes LMCache metadata with tensor shapes, dtypes, chunk size, MLA mode, world size, and worker ID
loop asyncio.AbstractEventLoop Yes Asyncio event loop for async connector operations
local_cpu_backend Optional[LocalCPUBackend] No CPU backend for memory allocation (None when running in scheduler role)
dst_device str No Target device string (default: "cuda")

Outputs

Name Type Description
RemoteBackend StorageBackendInterface A fully initialized remote storage backend with connector, serializer/deserializer, MLA mode handling, metrics, and reconnection support

Usage Examples

from lmcache.v1.storage_backend.remote_backend import RemoteBackend

# Initialize remote backend (typically done by the LMCache engine)
remote_backend = RemoteBackend(
    config=lmcache_config,
    metadata=lmcache_metadata,
    loop=asyncio.get_event_loop(),
    local_cpu_backend=local_cpu_backend,
    dst_device="cuda:0",
)

# Check if key exists remotely
if remote_backend.contains(cache_key):
    # Blocking get with deserialization
    memory_obj = remote_backend.get_blocking(cache_key)

# Async put with serialization and callback
future = remote_backend.submit_put_task(
    cache_key, memory_obj,
    on_complete_callback=lambda k: print(f"Stored {k}"),
)

# Batched operations
hit_count = remote_backend.batched_contains(keys)
results = remote_backend.batched_get_blocking(keys)
remote_backend.batched_submit_put_task(keys, memory_objs)

# Async non-blocking prefetch
prefetched = await remote_backend.batched_get_non_blocking("lookup-1", keys)

# Cleanup
remote_backend.close()

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment