Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:LMCache LMCache V1 Protocol

From Leeroopedia


Knowledge Sources
Domains Network Protocol, Serialization
Last Updated 2026-02-09 00:00 GMT

Overview

This module defines the wire protocol for TCP-based communication between LMCache clients and the standalone remote server, including command types, dtype mappings, and serializable message structures.

Description

The protocol is built around two message types: ClientMetaMessage (requests from workers) and ServerMetaMessage (responses from the server). Both use struct.pack/struct.unpack for binary serialization with fixed-size formats. ClientCommand enumerates supported operations (PUT, GET, EXIST, LIST, HEALTH), while ServerReturnCode indicates success (200) or failure (400). The module also defines RemoteMetadata for multi-group KV cache metadata serialization, with configurable format strings based on the number of layer groups. Bidirectional mappings between PyTorch dtypes and integers (DTYPE_TO_INT/INT_TO_DTYPE) and between storage locations and integers (LOCATION_TO_INT/INT_TO_LOCATION) enable compact binary encoding. Key strings are fixed-length padded to MAX_KEY_LENGTH (150 bytes).

Usage

Use this module when implementing or extending the TCP-based remote storage backend for LMCache. Both the standalone server and the remote client use these message types for all communication.

Code Reference

Source Location

Signature

class ClientCommand(IntEnum):
    PUT = auto()
    GET = auto()
    EXIST = auto()
    LIST = auto()
    HEALTH = auto()

class ServerReturnCode(IntEnum):
    SUCCESS = 200
    FAIL = 400

@dataclass
class RemoteMetadata:
    length: int
    shapes: list[torch.Size]
    dtypes: list[torch.dtype]
    fmt: MemoryFormat

    def serialize(self) -> bytes: ...
    def serialize_into(self, buffer): ...
    @staticmethod
    def deserialize(s: bytes) -> "RemoteMetadata": ...

@dataclass
class ClientMetaMessage:
    command: ClientCommand
    key: Union[CacheEngineKey, LayerCacheEngineKey]
    length: int
    fmt: MemoryFormat
    dtype: Optional[torch.dtype]
    shape: torch.Size
    location: Optional[str] = None

    def serialize(self) -> bytes: ...
    @staticmethod
    def deserialize(s: bytes) -> "ClientMetaMessage": ...
    @staticmethod
    def packlength() -> int: ...

@dataclass
class ServerMetaMessage:
    code: ServerReturnCode
    length: int
    fmt: MemoryFormat
    dtype: Optional[torch.dtype]
    shape: torch.Size
    location: Optional[str] = None

    def serialize(self) -> bytes: ...
    @staticmethod
    def deserialize(s: bytes) -> "ServerMetaMessage": ...
    @staticmethod
    def packlength() -> int: ...

def init_remote_metadata_info(num_groups: int): ...
def get_remote_metadata_bytes() -> int: ...

Import

from lmcache.v1.protocol import (
    ClientCommand,
    ServerReturnCode,
    ClientMetaMessage,
    ServerMetaMessage,
    RemoteMetadata,
    init_remote_metadata_info,
    get_remote_metadata_bytes,
    DTYPE_TO_INT,
    INT_TO_DTYPE,
)

I/O Contract

Inputs

Name Type Required Description
command ClientCommand Yes The operation to perform (PUT, GET, EXIST, LIST, HEALTH)
key Union[CacheEngineKey, LayerCacheEngineKey] Yes Cache key identifying the target entry
length int Yes Length of the data payload in bytes
fmt MemoryFormat Yes Memory format of the cached data
dtype Optional[torch.dtype] Yes Data type of the tensor (mapped to int for serialization)
shape torch.Size Yes 4-dimensional shape of the cached tensor
location Optional[str] No Storage backend location identifier
num_groups int Yes Number of KV layer groups (for init_remote_metadata_info)

Outputs

Name Type Description
serialize bytes Binary-encoded message ready for TCP transmission
deserialize ClientMetaMessage or ServerMetaMessage Deserialized message from raw bytes
packlength int Fixed byte size of the serialized message header
code ServerReturnCode SUCCESS (200) or FAIL (400) status

Usage Examples

from lmcache.v1.protocol import ClientMetaMessage, ClientCommand, ServerMetaMessage
from lmcache.v1.memory_management import MemoryFormat
import torch

# Create and serialize a client PUT request
msg = ClientMetaMessage(
    command=ClientCommand.PUT,
    key=cache_key,
    length=4096,
    fmt=MemoryFormat.KV_BLOB,
    dtype=torch.float16,
    shape=torch.Size([2, 32, 256, 128]),
)
data = msg.serialize()

# Deserialize a server response
response = ServerMetaMessage.deserialize(raw_bytes)
if response.code == ServerReturnCode.SUCCESS:
    print(f"Received {response.length} bytes")

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment