Principle:LMCache LMCache VLLM KV Connector Integration

Knowledge Sources	LMCache vLLM
Domains	Infrastructure, Serving
Last Updated	2026-02-09 00:00 GMT

Overview

An integration pattern that bridges an external LLM serving engine with a KV cache management system through a standardized connector interface.

Description

vLLM KV Connector Integration is the pattern of embedding LMCache into vLLM's inference pipeline via the KVConnectorBase_V1 interface. vLLM defines a connector API (KVTransferConfig, KVConnectorRole) that external systems can implement to intercept KV cache operations during inference. LMCache provides LMCacheConnectorV1Dynamic as the entry point that delegates to LMCacheConnectorV1Impl, which initializes the full LMCache stack (manager, cache engine, storage backends, GPU connectors).

This solves the problem of transparent KV cache reuse: by plugging into vLLM's connector API, LMCache can store KV caches from completed requests and retrieve them for new requests with shared prefixes, without modifying vLLM's core inference logic.

Usage

Use this principle when deploying LMCache with vLLM. Specify the connector in vLLM's launch arguments via --kv-transfer-config with kv_connector set to "LMCacheConnectorV1". The connector handles both scheduler-side (token matching, request tracking) and worker-side (KV cache load/save, GPU memory management) operations.

Theoretical Basis

The connector follows a dual-role architecture:

Scheduler role: Runs on the scheduler process. Handles token matching via get_num_new_matched_tokens, builds connector metadata, and tracks unfinished requests.
Worker role: Runs on each GPU worker. Handles actual KV cache transfer: start_load_kv (retrieve from cache to GPU), save_kv_layer (store from GPU to cache), wait_for_layer_load (synchronization barrier).

The initialization sequence:

LMCacheConnectorV1Dynamic.__init__ creates LMCacheConnectorV1Impl
LMCacheConnectorV1Impl loads LMCacheEngineConfig
LMCacheManager is created (initializes cache engine, storage backends)
Connector state is initialized (blender if enabled, layer tracking, chunk settings)

Related Pages

Implemented By

Implementation:LMCache_LMCache_LMCacheConnectorV1Impl_Init

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment