Implementation:LMCache LMCache Connector V1
| Knowledge Sources | |
|---|---|
| Domains | Integration, vLLM, KV Cache |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Implements the vLLM v1 KV connector interface for LMCache, enabling external KV cache loading and saving within the vLLM serving framework.
Description
The LMCacheConnectorV1Dynamic class extends vLLM's KVConnectorBase_V1 to integrate LMCache as an external KV cache provider. It delegates all operations to an internal LMCacheConnectorV1Impl instance. On the worker side, it supports registering KV caches, asynchronous loading of KV data before forward passes via start_load_kv, layer-by-layer synchronization through wait_for_layer_load, saving KV layers during attention computation, and tracking finished requests. On the scheduler side, it provides token matching to determine how many tokens can be served from the external cache, state updates after block allocation, and connector metadata construction.
Usage
Use this connector when running vLLM (version > 0.8.5) with LMCache as the KV transfer backend. It is registered as a KV connector plugin and instantiated automatically by vLLM's distributed KV transfer framework.
Code Reference
Source Location
- Repository: LMCache
- File: lmcache/integration/vllm/lmcache_connector_v1.py
- Lines: 1-213
Signature
class LMCacheConnectorV1Dynamic(KVConnectorBase_V1):
def __init__(self, vllm_config: "VllmConfig", role: KVConnectorRole, kv_cache_config: Optional[Any] = None): ...
def register_kv_caches(self, kv_caches: dict[str, torch.Tensor]): ...
def start_load_kv(self, forward_context: "ForwardContext", **kwargs) -> None: ...
def wait_for_layer_load(self, layer_name: str) -> None: ...
def save_kv_layer(self, layer_name: str, kv_layer: torch.Tensor, attn_metadata: "AttentionMetadata", **kwargs) -> None: ...
def wait_for_save(self): ...
def get_finished(self, finished_req_ids: set[str]) -> tuple[Optional[set[str]], Optional[set[str]]]: ...
def get_block_ids_with_load_errors(self) -> set[int]: ...
def shutdown(self): ...
def get_num_new_matched_tokens(self, request: "Request", num_computed_tokens: int) -> tuple[Optional[int], bool]: ...
def update_state_after_alloc(self, request: "Request", blocks: "KVCacheBlocks", num_external_tokens: int): ...
def build_connector_meta(self, scheduler_output: SchedulerOutput) -> KVConnectorMetadata: ...
def request_finished(self, request: "Request", block_ids: list[int]) -> tuple[bool, Optional[dict[str, Any]]]: ...
Import
from lmcache.integration.vllm.lmcache_connector_v1 import LMCacheConnectorV1Dynamic
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| vllm_config | VllmConfig | Yes | The vLLM configuration object containing model and serving parameters |
| role | KVConnectorRole | Yes | Whether this connector operates as scheduler-side or worker-side |
| kv_cache_config | Optional[Any] | No | Optional KV cache configuration passed to the base class |
| forward_context | ForwardContext | Yes (start_load_kv) | Context object providing KV caches and layer names for the current forward pass |
| layer_name | str | Yes (wait_for_layer_load, save_kv_layer) | Name of the transformer layer |
| kv_layer | torch.Tensor | Yes (save_kv_layer) | The paged KV buffer tensor for the layer |
| attn_metadata | AttentionMetadata | Yes (save_kv_layer) | Attention metadata including sequence info |
| request | Request | Yes (scheduler methods) | The vLLM request object |
| num_computed_tokens | int | Yes (get_num_new_matched_tokens) | Number of tokens already computed locally |
Outputs
| Name | Type | Description |
|---|---|---|
| num_new_matched_tokens | tuple[Optional[int], bool] | Number of externally available tokens and a prefill-required flag |
| connector_meta | KVConnectorMetadata | Metadata for the current scheduling step |
| finished_ids | tuple[Optional[set[str]], Optional[set[str]]] | Sets of request IDs that finished saving and loading |
| block_ids_with_errors | set[int] | Block IDs that failed to load |
Usage Examples
# Typically instantiated by vLLM's KV connector framework:
from lmcache.integration.vllm.lmcache_connector_v1 import LMCacheConnectorV1Dynamic
connector = LMCacheConnectorV1Dynamic(
vllm_config=vllm_config,
role=KVConnectorRole.WORKER,
)
connector.register_kv_caches(kv_caches)