Implementation:LMCache LMCache VLLM Serve Decoder
| Knowledge Sources | |
|---|---|
| Domains | Serving, Distributed_Systems |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for launching vLLM decoder instances with LMCache KV consumer configuration, provided as a wrapper around vllm serve.
Description
The decoder is launched via vllm serve with a KVTransferConfig specifying kv_connector="LMCacheConnectorV1" and kv_role="kv_consumer". Internally, the LMCache connector creates a PDBackend in receiver mode that listens for incoming NIXL connections and handles memory allocation for KV cache transfers from prefillers.
Usage
Set LMCACHE_CONFIG_FILE to the decoder config, set CUDA_VISIBLE_DEVICES to the decoder GPU, then run vllm serve with the appropriate kv-transfer-config.
Code Reference
Source Location
- Repository: LMCache
- File: examples/disagg_prefill/1p1d/disagg_vllm_launcher.sh
- Lines: L46-L57
Signature
CUDA_VISIBLE_DEVICES=$DECODE_CUDA_DEVICE vllm serve $MODEL \
--port $DECODE_PORT \
--kv-transfer-config '{
"kv_connector": "LMCacheConnectorV1",
"kv_role": "kv_consumer",
"kv_connector_extra_config": {
"discard_partial_chunks": false,
"skip_last_n_tokens": 1,
"lmcache_rpc_port": "consumer1"
}
}'
Import
export LMCACHE_CONFIG_FILE=/path/to/lmcache-decoder-config.yaml
bash examples/disagg_prefill/1p1d/disagg_vllm_launcher.sh decoder
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| LMCACHE_CONFIG_FILE | env var | Yes | Path to decoder YAML config |
| CUDA_VISIBLE_DEVICES | env var | Yes | GPU device for decoder |
| MODEL | str | Yes | HuggingFace model name |
| kv_role | str | Yes | Must be "kv_consumer" |
Outputs
| Name | Type | Description |
|---|---|---|
| vLLM server | process | Running vLLM instance accepting OpenAI-compatible requests |