Implementation:LMCache LMCache VLLM Serve Decoder

Knowledge Sources	LMCache vLLM
Domains	Serving, Distributed_Systems
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for launching vLLM decoder instances with LMCache KV consumer configuration, provided as a wrapper around vllm serve.

Description

The decoder is launched via vllm serve with a KVTransferConfig specifying kv_connector="LMCacheConnectorV1" and kv_role="kv_consumer". Internally, the LMCache connector creates a PDBackend in receiver mode that listens for incoming NIXL connections and handles memory allocation for KV cache transfers from prefillers.

Usage

Set LMCACHE_CONFIG_FILE to the decoder config, set CUDA_VISIBLE_DEVICES to the decoder GPU, then run vllm serve with the appropriate kv-transfer-config.

Code Reference

Source Location

Repository: LMCache
File: examples/disagg_prefill/1p1d/disagg_vllm_launcher.sh
Lines: L46-L57

Signature

CUDA_VISIBLE_DEVICES=$DECODE_CUDA_DEVICE vllm serve $MODEL \
    --port $DECODE_PORT \
    --kv-transfer-config '{
        "kv_connector": "LMCacheConnectorV1",
        "kv_role": "kv_consumer",
        "kv_connector_extra_config": {
            "discard_partial_chunks": false,
            "skip_last_n_tokens": 1,
            "lmcache_rpc_port": "consumer1"
        }
    }'

Import

export LMCACHE_CONFIG_FILE=/path/to/lmcache-decoder-config.yaml
bash examples/disagg_prefill/1p1d/disagg_vllm_launcher.sh decoder

I/O Contract

Inputs

Name	Type	Required	Description
LMCACHE_CONFIG_FILE	env var	Yes	Path to decoder YAML config
CUDA_VISIBLE_DEVICES	env var	Yes	GPU device for decoder
MODEL	str	Yes	HuggingFace model name
kv_role	str	Yes	Must be "kv_consumer"

Outputs

Name	Type	Description
vLLM server	process	Running vLLM instance accepting OpenAI-compatible requests

Related Pages

Implements Principle

Principle:LMCache_LMCache_Decoder_Instance_Launch

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment