Implementation:LMCache LMCache VLLM Serve Prefiller
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Serving, Distributed_Systems |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for launching vLLM prefiller instances with LMCache KV producer configuration, provided as a wrapper around vllm serve.
Description
The prefiller is launched via vllm serve with kv_role="kv_producer". The LMCache connector creates a PDBackend in sender mode that connects to the proxy's ZMQ port for notifications and establishes NIXL connections to decoders on demand.
Usage
Set LMCACHE_CONFIG_FILE to the prefiller config, set CUDA_VISIBLE_DEVICES to the prefiller GPU, then run vllm serve.
Code Reference
Source Location
- Repository: LMCache
- File: examples/disagg_prefill/1p1d/disagg_vllm_launcher.sh
- Lines: L26-L37
Signature
CUDA_VISIBLE_DEVICES=$PREFILL_CUDA_DEVICE vllm serve $MODEL \
--port $PREFILL_PORT \
--kv-transfer-config '{
"kv_connector": "LMCacheConnectorV1",
"kv_role": "kv_producer",
"kv_connector_extra_config": {
"discard_partial_chunks": false,
"lmcache_rpc_port": "producer1"
}
}'
Import
export LMCACHE_CONFIG_FILE=/path/to/lmcache-prefiller-config.yaml
bash examples/disagg_prefill/1p1d/disagg_vllm_launcher.sh prefiller
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| LMCACHE_CONFIG_FILE | env var | Yes | Path to prefiller YAML config |
| CUDA_VISIBLE_DEVICES | env var | Yes | GPU device for prefiller |
| MODEL | str | Yes | HuggingFace model name |
| kv_role | str | Yes | Must be "kv_producer" |
Outputs
| Name | Type | Description |
|---|---|---|
| vLLM server | process | Running vLLM instance that computes prefill and transfers KV |
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment