Principle:LMCache LMCache Prefill Decode Configuration

Knowledge Sources	LMCache Splitwise
Domains	Configuration, Distributed_Systems
Last Updated	2026-02-09 00:00 GMT

Overview

A role-based configuration pattern that creates separate sender (prefiller) and receiver (decoder) configurations for disaggregated prefill-decode inference.

Description

Prefill Decode Configuration extends the base LMCacheEngineConfig with fields specific to disaggregated prefill-decode (PD) mode. Two separate configuration instances are created: one for the prefiller (pd_role="sender") and one for the decoder (pd_role="receiver"). Each configuration specifies the NIXL transfer parameters (buffer size, device, peer host/ports) and proxy connection details.

Validation ensures that sender configs have proxy host/port set, receiver configs have peer init/alloc ports set, and both have buffer_size and buffer_device configured.

Usage

Use this principle when deploying disaggregated prefill. Create two YAML config files (one per role) specifying the PD fields, then load them via load_engine_config_with_overrides.

Theoretical Basis

The PD configuration follows a sender-receiver model:

Sender (Prefiller): Computes attention, stores KV cache, writes to receiver via NIXL
Receiver (Decoder): Receives KV cache from sender, runs autoregressive decoding
Proxy: Routes requests between prefiller and decoder

Related Pages

Implemented By

Implementation:LMCache_LMCache_LMCacheEngineConfig_PD_Fields

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment