Implementation:LMCache LMCache Connector V1

Knowledge Sources	LMCache
Domains	Integration, vLLM, KV Cache
Last Updated	2026-02-09 00:00 GMT

Overview

Implements the vLLM v1 KV connector interface for LMCache, enabling external KV cache loading and saving within the vLLM serving framework.

Description

The LMCacheConnectorV1Dynamic class extends vLLM's KVConnectorBase_V1 to integrate LMCache as an external KV cache provider. It delegates all operations to an internal LMCacheConnectorV1Impl instance. On the worker side, it supports registering KV caches, asynchronous loading of KV data before forward passes via start_load_kv, layer-by-layer synchronization through wait_for_layer_load, saving KV layers during attention computation, and tracking finished requests. On the scheduler side, it provides token matching to determine how many tokens can be served from the external cache, state updates after block allocation, and connector metadata construction.

Usage

Use this connector when running vLLM (version > 0.8.5) with LMCache as the KV transfer backend. It is registered as a KV connector plugin and instantiated automatically by vLLM's distributed KV transfer framework.

Code Reference

Source Location

Repository: LMCache
File: lmcache/integration/vllm/lmcache_connector_v1.py
Lines: 1-213

Signature

class LMCacheConnectorV1Dynamic(KVConnectorBase_V1):
    def __init__(self, vllm_config: "VllmConfig", role: KVConnectorRole, kv_cache_config: Optional[Any] = None): ...
    def register_kv_caches(self, kv_caches: dict[str, torch.Tensor]): ...
    def start_load_kv(self, forward_context: "ForwardContext", **kwargs) -> None: ...
    def wait_for_layer_load(self, layer_name: str) -> None: ...
    def save_kv_layer(self, layer_name: str, kv_layer: torch.Tensor, attn_metadata: "AttentionMetadata", **kwargs) -> None: ...
    def wait_for_save(self): ...
    def get_finished(self, finished_req_ids: set[str]) -> tuple[Optional[set[str]], Optional[set[str]]]: ...
    def get_block_ids_with_load_errors(self) -> set[int]: ...
    def shutdown(self): ...
    def get_num_new_matched_tokens(self, request: "Request", num_computed_tokens: int) -> tuple[Optional[int], bool]: ...
    def update_state_after_alloc(self, request: "Request", blocks: "KVCacheBlocks", num_external_tokens: int): ...
    def build_connector_meta(self, scheduler_output: SchedulerOutput) -> KVConnectorMetadata: ...
    def request_finished(self, request: "Request", block_ids: list[int]) -> tuple[bool, Optional[dict[str, Any]]]: ...

Import

from lmcache.integration.vllm.lmcache_connector_v1 import LMCacheConnectorV1Dynamic

I/O Contract

Inputs

Name	Type	Required	Description
vllm_config	VllmConfig	Yes	The vLLM configuration object containing model and serving parameters
role	KVConnectorRole	Yes	Whether this connector operates as scheduler-side or worker-side
kv_cache_config	Optional[Any]	No	Optional KV cache configuration passed to the base class
forward_context	ForwardContext	Yes (start_load_kv)	Context object providing KV caches and layer names for the current forward pass
layer_name	str	Yes (wait_for_layer_load, save_kv_layer)	Name of the transformer layer
kv_layer	torch.Tensor	Yes (save_kv_layer)	The paged KV buffer tensor for the layer
attn_metadata	AttentionMetadata	Yes (save_kv_layer)	Attention metadata including sequence info
request	Request	Yes (scheduler methods)	The vLLM request object
num_computed_tokens	int	Yes (get_num_new_matched_tokens)	Number of tokens already computed locally

Outputs

Name	Type	Description
num_new_matched_tokens	tuple[Optional[int], bool]	Number of externally available tokens and a prefill-required flag
connector_meta	KVConnectorMetadata	Metadata for the current scheduling step
finished_ids	tuple[Optional[set[str]], Optional[set[str]]]	Sets of request IDs that finished saving and loading
block_ids_with_errors	set[int]	Block IDs that failed to load

Usage Examples

# Typically instantiated by vLLM's KV connector framework:
from lmcache.integration.vllm.lmcache_connector_v1 import LMCacheConnectorV1Dynamic

connector = LMCacheConnectorV1Dynamic(
    vllm_config=vllm_config,
    role=KVConnectorRole.WORKER,
)
connector.register_kv_caches(kv_caches)

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment