Principle:LMCache LMCache Prefill Decode Configuration
| Knowledge Sources | |
|---|---|
| Domains | Configuration, Distributed_Systems |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A role-based configuration pattern that creates separate sender (prefiller) and receiver (decoder) configurations for disaggregated prefill-decode inference.
Description
Prefill Decode Configuration extends the base LMCacheEngineConfig with fields specific to disaggregated prefill-decode (PD) mode. Two separate configuration instances are created: one for the prefiller (pd_role="sender") and one for the decoder (pd_role="receiver"). Each configuration specifies the NIXL transfer parameters (buffer size, device, peer host/ports) and proxy connection details.
Validation ensures that sender configs have proxy host/port set, receiver configs have peer init/alloc ports set, and both have buffer_size and buffer_device configured.
Usage
Use this principle when deploying disaggregated prefill. Create two YAML config files (one per role) specifying the PD fields, then load them via load_engine_config_with_overrides.
Theoretical Basis
The PD configuration follows a sender-receiver model:
- Sender (Prefiller): Computes attention, stores KV cache, writes to receiver via NIXL
- Receiver (Decoder): Receives KV cache from sender, runs autoregressive decoding
- Proxy: Routes requests between prefiller and decoder