Implementation:LMCache LMCache VLLM Serve Prefiller

Knowledge Sources	LMCache vLLM
Domains	Serving, Distributed_Systems
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for launching vLLM prefiller instances with LMCache KV producer configuration, provided as a wrapper around vllm serve.

Description

The prefiller is launched via vllm serve with kv_role="kv_producer". The LMCache connector creates a PDBackend in sender mode that connects to the proxy's ZMQ port for notifications and establishes NIXL connections to decoders on demand.

Usage

Set LMCACHE_CONFIG_FILE to the prefiller config, set CUDA_VISIBLE_DEVICES to the prefiller GPU, then run vllm serve.

Code Reference

Source Location

Repository: LMCache
File: examples/disagg_prefill/1p1d/disagg_vllm_launcher.sh
Lines: L26-L37

Signature

CUDA_VISIBLE_DEVICES=$PREFILL_CUDA_DEVICE vllm serve $MODEL \
    --port $PREFILL_PORT \
    --kv-transfer-config '{
        "kv_connector": "LMCacheConnectorV1",
        "kv_role": "kv_producer",
        "kv_connector_extra_config": {
            "discard_partial_chunks": false,
            "lmcache_rpc_port": "producer1"
        }
    }'

Import

export LMCACHE_CONFIG_FILE=/path/to/lmcache-prefiller-config.yaml
bash examples/disagg_prefill/1p1d/disagg_vllm_launcher.sh prefiller

I/O Contract

Inputs

Name	Type	Required	Description
LMCACHE_CONFIG_FILE	env var	Yes	Path to prefiller YAML config
CUDA_VISIBLE_DEVICES	env var	Yes	GPU device for prefiller
MODEL	str	Yes	HuggingFace model name
kv_role	str	Yes	Must be "kv_producer"

Outputs

Name	Type	Description
vLLM server	process	Running vLLM instance that computes prefill and transfers KV

Related Pages

Implements Principle

Principle:LMCache_LMCache_Prefiller_Instance_Launch

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment