Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:LMCache LMCache VLLM Serve Prefiller

From Leeroopedia


Knowledge Sources
Domains Serving, Distributed_Systems
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for launching vLLM prefiller instances with LMCache KV producer configuration, provided as a wrapper around vllm serve.

Description

The prefiller is launched via vllm serve with kv_role="kv_producer". The LMCache connector creates a PDBackend in sender mode that connects to the proxy's ZMQ port for notifications and establishes NIXL connections to decoders on demand.

Usage

Set LMCACHE_CONFIG_FILE to the prefiller config, set CUDA_VISIBLE_DEVICES to the prefiller GPU, then run vllm serve.

Code Reference

Source Location

  • Repository: LMCache
  • File: examples/disagg_prefill/1p1d/disagg_vllm_launcher.sh
  • Lines: L26-L37

Signature

CUDA_VISIBLE_DEVICES=$PREFILL_CUDA_DEVICE vllm serve $MODEL \
    --port $PREFILL_PORT \
    --kv-transfer-config '{
        "kv_connector": "LMCacheConnectorV1",
        "kv_role": "kv_producer",
        "kv_connector_extra_config": {
            "discard_partial_chunks": false,
            "lmcache_rpc_port": "producer1"
        }
    }'

Import

export LMCACHE_CONFIG_FILE=/path/to/lmcache-prefiller-config.yaml
bash examples/disagg_prefill/1p1d/disagg_vllm_launcher.sh prefiller

I/O Contract

Inputs

Name Type Required Description
LMCACHE_CONFIG_FILE env var Yes Path to prefiller YAML config
CUDA_VISIBLE_DEVICES env var Yes GPU device for prefiller
MODEL str Yes HuggingFace model name
kv_role str Yes Must be "kv_producer"

Outputs

Name Type Description
vLLM server process Running vLLM instance that computes prefill and transfers KV

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment