Principle:LMCache LMCache Prefiller Instance Launch

Knowledge Sources	LMCache vLLM
Domains	Serving, Distributed_Systems
Last Updated	2026-02-09 00:00 GMT

Overview

A deployment pattern for launching vLLM prefiller instances configured as KV cache producers in a disaggregated prefill-decode architecture.

Description

Prefiller instances run the prefill phase (full attention computation) and transfer the resulting KV cache to decoder instances. They are launched with kv_role="kv_producer" and the LMCache connector in sender mode. After computing attention, the PDBackend sends KV data to the decoder via NIXL and notifies the proxy via ZMQ.

Usage

Launch prefiller instances after both the proxy server and decoder instances are running. The prefiller must have LMCACHE_CONFIG_FILE pointing to a prefiller-specific config (pd_role="sender").

Theoretical Basis

The prefiller performs a compute-and-transfer cycle:

Receive request from proxy with disagg_spec containing decoder endpoint
Run full attention prefill on the prompt
Connect to decoder's NIXL endpoint (if not already connected)
Send AllocRequest to decoder, receive AllocResponse with buffer indices
Write KV cache chunks to decoder via NIXL batched_write
Send ProxyNotif via ZMQ to signal transfer completion

Related Pages

Implemented By

Implementation:LMCache_LMCache_VLLM_Serve_Prefiller

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment