Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:LMCache LMCache Prefill Decode Configuration

From Leeroopedia


Knowledge Sources
Domains Configuration, Distributed_Systems
Last Updated 2026-02-09 00:00 GMT

Overview

A role-based configuration pattern that creates separate sender (prefiller) and receiver (decoder) configurations for disaggregated prefill-decode inference.

Description

Prefill Decode Configuration extends the base LMCacheEngineConfig with fields specific to disaggregated prefill-decode (PD) mode. Two separate configuration instances are created: one for the prefiller (pd_role="sender") and one for the decoder (pd_role="receiver"). Each configuration specifies the NIXL transfer parameters (buffer size, device, peer host/ports) and proxy connection details.

Validation ensures that sender configs have proxy host/port set, receiver configs have peer init/alloc ports set, and both have buffer_size and buffer_device configured.

Usage

Use this principle when deploying disaggregated prefill. Create two YAML config files (one per role) specifying the PD fields, then load them via load_engine_config_with_overrides.

Theoretical Basis

The PD configuration follows a sender-receiver model:

  • Sender (Prefiller): Computes attention, stores KV cache, writes to receiver via NIXL
  • Receiver (Decoder): Receives KV cache from sender, runs autoregressive decoding
  • Proxy: Routes requests between prefiller and decoder

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment