Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:LMCache LMCache VLLM Serve Decoder

From Leeroopedia


Knowledge Sources
Domains Serving, Distributed_Systems
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for launching vLLM decoder instances with LMCache KV consumer configuration, provided as a wrapper around vllm serve.

Description

The decoder is launched via vllm serve with a KVTransferConfig specifying kv_connector="LMCacheConnectorV1" and kv_role="kv_consumer". Internally, the LMCache connector creates a PDBackend in receiver mode that listens for incoming NIXL connections and handles memory allocation for KV cache transfers from prefillers.

Usage

Set LMCACHE_CONFIG_FILE to the decoder config, set CUDA_VISIBLE_DEVICES to the decoder GPU, then run vllm serve with the appropriate kv-transfer-config.

Code Reference

Source Location

  • Repository: LMCache
  • File: examples/disagg_prefill/1p1d/disagg_vllm_launcher.sh
  • Lines: L46-L57

Signature

CUDA_VISIBLE_DEVICES=$DECODE_CUDA_DEVICE vllm serve $MODEL \
    --port $DECODE_PORT \
    --kv-transfer-config '{
        "kv_connector": "LMCacheConnectorV1",
        "kv_role": "kv_consumer",
        "kv_connector_extra_config": {
            "discard_partial_chunks": false,
            "skip_last_n_tokens": 1,
            "lmcache_rpc_port": "consumer1"
        }
    }'

Import

export LMCACHE_CONFIG_FILE=/path/to/lmcache-decoder-config.yaml
bash examples/disagg_prefill/1p1d/disagg_vllm_launcher.sh decoder

I/O Contract

Inputs

Name Type Required Description
LMCACHE_CONFIG_FILE env var Yes Path to decoder YAML config
CUDA_VISIBLE_DEVICES env var Yes GPU device for decoder
MODEL str Yes HuggingFace model name
kv_role str Yes Must be "kv_consumer"

Outputs

Name Type Description
vLLM server process Running vLLM instance accepting OpenAI-compatible requests

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment