Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:LMCache LMCache Connector V1

From Leeroopedia


Knowledge Sources
Domains Integration, vLLM, KV Cache
Last Updated 2026-02-09 00:00 GMT

Overview

Implements the vLLM v1 KV connector interface for LMCache, enabling external KV cache loading and saving within the vLLM serving framework.

Description

The LMCacheConnectorV1Dynamic class extends vLLM's KVConnectorBase_V1 to integrate LMCache as an external KV cache provider. It delegates all operations to an internal LMCacheConnectorV1Impl instance. On the worker side, it supports registering KV caches, asynchronous loading of KV data before forward passes via start_load_kv, layer-by-layer synchronization through wait_for_layer_load, saving KV layers during attention computation, and tracking finished requests. On the scheduler side, it provides token matching to determine how many tokens can be served from the external cache, state updates after block allocation, and connector metadata construction.

Usage

Use this connector when running vLLM (version > 0.8.5) with LMCache as the KV transfer backend. It is registered as a KV connector plugin and instantiated automatically by vLLM's distributed KV transfer framework.

Code Reference

Source Location

Signature

class LMCacheConnectorV1Dynamic(KVConnectorBase_V1):
    def __init__(self, vllm_config: "VllmConfig", role: KVConnectorRole, kv_cache_config: Optional[Any] = None): ...
    def register_kv_caches(self, kv_caches: dict[str, torch.Tensor]): ...
    def start_load_kv(self, forward_context: "ForwardContext", **kwargs) -> None: ...
    def wait_for_layer_load(self, layer_name: str) -> None: ...
    def save_kv_layer(self, layer_name: str, kv_layer: torch.Tensor, attn_metadata: "AttentionMetadata", **kwargs) -> None: ...
    def wait_for_save(self): ...
    def get_finished(self, finished_req_ids: set[str]) -> tuple[Optional[set[str]], Optional[set[str]]]: ...
    def get_block_ids_with_load_errors(self) -> set[int]: ...
    def shutdown(self): ...
    def get_num_new_matched_tokens(self, request: "Request", num_computed_tokens: int) -> tuple[Optional[int], bool]: ...
    def update_state_after_alloc(self, request: "Request", blocks: "KVCacheBlocks", num_external_tokens: int): ...
    def build_connector_meta(self, scheduler_output: SchedulerOutput) -> KVConnectorMetadata: ...
    def request_finished(self, request: "Request", block_ids: list[int]) -> tuple[bool, Optional[dict[str, Any]]]: ...

Import

from lmcache.integration.vllm.lmcache_connector_v1 import LMCacheConnectorV1Dynamic

I/O Contract

Inputs

Name Type Required Description
vllm_config VllmConfig Yes The vLLM configuration object containing model and serving parameters
role KVConnectorRole Yes Whether this connector operates as scheduler-side or worker-side
kv_cache_config Optional[Any] No Optional KV cache configuration passed to the base class
forward_context ForwardContext Yes (start_load_kv) Context object providing KV caches and layer names for the current forward pass
layer_name str Yes (wait_for_layer_load, save_kv_layer) Name of the transformer layer
kv_layer torch.Tensor Yes (save_kv_layer) The paged KV buffer tensor for the layer
attn_metadata AttentionMetadata Yes (save_kv_layer) Attention metadata including sequence info
request Request Yes (scheduler methods) The vLLM request object
num_computed_tokens int Yes (get_num_new_matched_tokens) Number of tokens already computed locally

Outputs

Name Type Description
num_new_matched_tokens tuple[Optional[int], bool] Number of externally available tokens and a prefill-required flag
connector_meta KVConnectorMetadata Metadata for the current scheduling step
finished_ids tuple[Optional[set[str]], Optional[set[str]]] Sets of request IDs that finished saving and loading
block_ids_with_errors set[int] Block IDs that failed to load

Usage Examples

# Typically instantiated by vLLM's KV connector framework:
from lmcache.integration.vllm.lmcache_connector_v1 import LMCacheConnectorV1Dynamic

connector = LMCacheConnectorV1Dynamic(
    vllm_config=vllm_config,
    role=KVConnectorRole.WORKER,
)
connector.register_kv_caches(kv_caches)

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment