Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Kserve Kserve Prefill Decode Specification

From Leeroopedia
Knowledge Sources
Domains LLM_Serving, Distributed_Systems, GPU_Computing
Last Updated 2026-02-13 00:00 GMT

Overview

A disaggregated inference architecture that separates the prefill (prompt processing) and decode (token generation) phases onto independent GPU pools for optimized throughput and latency.

Description

Prefill-Decode Specification enables a separation of concerns in LLM serving:

  • Prefill pool: Processes the input prompt, computing the KV cache. This is compute-bound and benefits from high GPU utilization.
  • Decode pool: Generates output tokens autoregressively using the transferred KV cache. This is memory-bound and latency-sensitive.

By separating these phases, each pool can be independently scaled and optimized. The KV cache is transferred between pools using NixlConnector over RDMA for minimal latency.

Usage

Use disaggregated PD serving when:

  • Prompt processing latency is not critical but token generation latency is
  • Prefill and decode have different scaling patterns
  • The model fits in GPU memory on single nodes

Theoretical Basis

# Prefill-Decode separation (NOT implementation code)
Standard LLM inference:
  [Prompt] → [Prefill: compute KV cache] → [Decode: generate tokens]
  Single pool handles both phases sequentially

Disaggregated PD:
  [Prompt] → [Prefill Pool: compute KV cache]
                    ↓ KV transfer (RDMA/NixlConnector)
             [Decode Pool: generate tokens using transferred KV]

Benefits:
  - Prefill pool optimized for throughput (batch prompts)
  - Decode pool optimized for latency (fast token generation)
  - Independent scaling: prefill_replicas != decode_replicas

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment