Implementation:LMCache LMCache SGLang Adapter
| Knowledge Sources | |
|---|---|
| Domains | SGLang Integration, KV Cache Management |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
This module provides adapter classes that integrate the LMCache engine with the SGLang inference framework for KV cache loading, storing, and layer-wise retrieval.
Description
The sglang_adapter.py module defines the connector classes (LMCacheConnector and LMCacheLayerwiseConnector) that bridge LMCache's caching engine with SGLang's runtime. The module initializes the LMCache engine using SGLang's model configuration, constructs the appropriate KV shape metadata, and provides worker-side APIs for loading and storing KV cache data. The layerwise variant supports incremental per-layer retrieval and storage with tensor-parallel synchronization.
Usage
Import and instantiate these connectors within an SGLang worker process to enable KV cache sharing through LMCache. The LMCacheConnector provides bulk load/store, while LMCacheLayerwiseConnector supports per-layer streaming retrieval for pipelined execution.
Code Reference
Source Location
- Repository: LMCache
- File: lmcache/integration/sglang/sglang_adapter.py
- Lines: 1-325
Signature
@dataclass
class StoreMetadata:
last_node: Any
token_ids: List[int]
kv_indices: torch.Tensor
offset: int
@dataclass
class LoadMetadata:
token_ids: List[int]
slot_mapping: torch.Tensor
offset: int
def init_lmcache_engine(
model_config: ModelConfig, tp_size: int, local_rank: int,
global_rank: int, kv_dtype: torch.dtype,
) -> LMCacheEngine: ...
class LMCacheConnector:
def __init__(
self, sgl_config: ModelConfig, tp_size: int, rank: int,
k_pool: List[torch.Tensor], v_pool: List[torch.Tensor],
): ...
def load_kv(self, load_metadata: LoadMetadata) -> int: ...
def store_kv(self, store_metadata: StoreMetadata) -> None: ...
def get_kv_events(self) -> Iterable[CacheStoreEvent]: ...
def chunk_size(self): ...
def reset(self): ...
def close(self): ...
class LMCacheLayerwiseConnector(LMCacheConnector):
def __init__(
self, sgl_config: ModelConfig, tp_size: int, rank: int,
k_pool: List[torch.Tensor], v_pool: List[torch.Tensor],
tp_group: Optional[torch.distributed.ProcessGroup] = None,
): ...
def global_min_tokens(
self, local_tokens: int, tp_group: dist.ProcessGroup,
device: torch.device,
): ...
def load_kv_layerwise(self, layer_id: int) -> None: ...
def start_load_kv(self, load_metadata: LoadMetadata) -> int: ...
def store_kv(self, store_metadata: StoreMetadata) -> None: ...
Import
from lmcache.integration.sglang.sglang_adapter import (
LMCacheConnector,
LMCacheLayerwiseConnector,
StoreMetadata,
LoadMetadata,
init_lmcache_engine,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| sgl_config | ModelConfig | Yes | SGLang model configuration containing layer count, head dimensions, etc. |
| tp_size | int | Yes | Tensor parallel size |
| rank | int | Yes | Global tensor parallel rank |
| k_pool | List[torch.Tensor] | Yes | Key cache tensor pool from SGLang |
| v_pool | List[torch.Tensor] | Yes | Value cache tensor pool from SGLang |
| tp_group | ProcessGroup | No | Torch distributed process group for tensor parallel synchronization (layerwise only) |
Outputs
| Name | Type | Description |
|---|---|---|
| num_retrieved_tokens | int | Number of tokens successfully retrieved from cache in load_kv / start_load_kv |
| CacheStoreEvent | Iterable | Events generated during KV cache store operations |
Usage Examples
from lmcache.integration.sglang.sglang_adapter import (
LMCacheConnector, LoadMetadata, StoreMetadata,
)
# Initialize connector with SGLang model config and KV pools
connector = LMCacheConnector(
sgl_config=model_config,
tp_size=1,
rank=0,
k_pool=k_pool_tensors,
v_pool=v_pool_tensors,
)
# Load KV cache for a request
load_meta = LoadMetadata(
token_ids=[1, 2, 3, 4, 5],
slot_mapping=slot_tensor,
offset=0,
)
num_loaded = connector.load_kv(load_meta)
# Store KV cache after forward pass
store_meta = StoreMetadata(
last_node=None,
token_ids=[1, 2, 3, 4, 5],
kv_indices=index_tensor,
offset=0,
)
connector.store_kv(store_meta)