Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:LMCache LMCache Prefiller Instance Launch

From Leeroopedia


Knowledge Sources
Domains Serving, Distributed_Systems
Last Updated 2026-02-09 00:00 GMT

Overview

A deployment pattern for launching vLLM prefiller instances configured as KV cache producers in a disaggregated prefill-decode architecture.

Description

Prefiller instances run the prefill phase (full attention computation) and transfer the resulting KV cache to decoder instances. They are launched with kv_role="kv_producer" and the LMCache connector in sender mode. After computing attention, the PDBackend sends KV data to the decoder via NIXL and notifies the proxy via ZMQ.

Usage

Launch prefiller instances after both the proxy server and decoder instances are running. The prefiller must have LMCACHE_CONFIG_FILE pointing to a prefiller-specific config (pd_role="sender").

Theoretical Basis

The prefiller performs a compute-and-transfer cycle:

  1. Receive request from proxy with disagg_spec containing decoder endpoint
  2. Run full attention prefill on the prompt
  3. Connect to decoder's NIXL endpoint (if not already connected)
  4. Send AllocRequest to decoder, receive AllocResponse with buffer indices
  5. Write KV cache chunks to decoder via NIXL batched_write
  6. Send ProxyNotif via ZMQ to signal transfer completion

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment