Environment:LMCache LMCache VLLM Serving Engine

Knowledge Sources	LMCache vLLM
Domains	Infrastructure, LLM_Serving
Last Updated	2026-02-09 00:00 GMT

Overview

vLLM serving engine integration environment providing the KVConnectorBase_V1 interface that LMCache hooks into for transparent KV cache management.

Description

LMCache integrates with vLLM as a KV cache connector via the `KVConnectorBase_V1` interface. The integration adapter (`LMCacheConnectorV1Impl` in `vllm_v1_adapter.py`) handles version compatibility across different vLLM releases. LMCache dynamically dispatches between connector implementations based on the detected vLLM version and handles API differences between vLLM releases (e.g., `torch_utils` module location changes). The vLLM integration supports all four LMCache workflows and is the primary deployment mode. SGLang is also supported as an alternative serving engine.

Usage

Use this environment when deploying LMCache with vLLM as the serving engine. The vLLM connector is activated by setting `LMCACHE_CONFIG_FILE` environment variable and launching vLLM with `--kv-connector LMCacheConnectorV1`. This is the standard deployment path for all production use cases including KV cache offloading, disaggregated prefill, P2P sharing, and CacheBlend.

System Requirements

Category	Requirement	Notes
vLLM	Compatible version	LMCache handles multiple vLLM API versions dynamically
GPU	NVIDIA CUDA or Intel XPU	vLLM platform detection selects appropriate backend
Python	>= 3.10	Aligned with LMCache Python requirements

Dependencies

Python Packages

`vllm` (provides `KVConnectorBase_V1`, `current_platform`)
`torch` (version determined by vLLM installation)
`lmcache` (installed as KV connector plugin)

Credentials

`LMCACHE_CONFIG_FILE`: Path to LMCache YAML configuration (required for vLLM integration).
`LMCACHE_FORCE_SKIP_SAVE`: Set to skip all cache save operations (optional runtime override).

Quick Install

# Install vLLM (determines torch version)
pip install vllm

# Install LMCache (from source, preserving vLLM's torch)
pip install -e . --no-build-isolation

# Launch vLLM with LMCache connector
LMCACHE_CONFIG_FILE=config.yaml vllm serve model_name --kv-connector LMCacheConnectorV1

Code Evidence

vLLM version import from `lmcache/integration/vllm/vllm_v1_adapter.py:23`:

from vllm.version import __version__ as VLLM_VERSION

vLLM API version compatibility from `lmcache/integration/vllm/utils.py:174-179`:

try:
    from vllm.utils.torch_utils import ...
except ImportError:
    from vllm.utils import ...

Platform detection for device selection from `lmcache/integration/vllm/utils.py:293-310`:

def get_torch_device():
    # Detects CUDA vs XPU via vLLM's current_platform
    if current_platform.is_cuda():
        torch_dev = torch.cuda
    elif current_platform.is_xpu():
        torch_dev = torch.xpu

Tensor parallel assumption from `lmcache/integration/vllm/utils.py:318-343`:

"""
Current assumption (TODO: add custom logic in the future):
- Tensor Parallel is intra-node
- Pipeline Parallel is inter-node
"""

Common Errors

Error Message	Cause	Solution
`ImportError: vllm.utils.torch_utils`	Older vLLM version with different API	LMCache handles this automatically via fallback import
`ImportError: vllm.platforms`	vLLM version lacks platform module	LMCache falls back gracefully; GPU detection may be limited
KV connector not recognized	vLLM version too old	Update vLLM to a version supporting `KVConnectorBase_V1`
`LMCACHE_CONFIG_FILE` not set	Missing configuration	Set the environment variable to a valid YAML config path

Compatibility Notes

vLLM API versions: LMCache dynamically handles multiple vLLM versions via try/except import patterns. No specific vLLM version is pinned.
SGLang alternative: LMCache also integrates with SGLang via `lmcache/integration/sglang/sglang_adapter.py` using a similar config mechanism.
Torch version: LMCache intentionally does not pin torch at runtime to avoid conflicts with the serving engine's torch version.
Multi-process mode: vLLM multi-process deployments use `vllm_multi_process_adapter.py` for cross-process KV cache coordination.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment