Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Kserve Kserve PD Scheduler Routing

From Leeroopedia
Knowledge Sources
Domains Scheduling, LLM_Serving, Traffic_Management
Last Updated 2026-02-13 00:00 GMT

Overview

An intelligent request scheduling pattern that routes inference requests to optimal GPU endpoints based on KV cache utilization, prefix cache hits, and queue depth.

Description

The PD Scheduler is an endpoint picker that sits between the Envoy Gateway and the model serving pods. It uses a plugin-based scoring system to select the best endpoint for each request:

  • queue-scorer: Penalizes endpoints with long request queues.
  • kv-cache-utilization-scorer: Penalizes endpoints with high KV cache memory usage.
  • prefix-cache-scorer: Rewards endpoints that already have the request's prefix in cache.
  • max-score-picker: Selects the endpoint with the highest total score.

For prefill-decode serving, a PD profile handler routes new requests to prefill pods and continuations to decode pods.

Usage

The scheduler is automatically deployed by the LLMIsvc controller when spec.router.scheduler is configured. Tune scorer weights to optimize for latency (favor prefix cache) or throughput (favor queue balance).

Theoretical Basis

# Scheduler scoring model (NOT implementation code)
For each request:
  For each healthy endpoint:
    score = 0
    score += queue_scorer.Score(endpoint) * weight_queue        (weight: 2)
    score += kv_cache_scorer.Score(endpoint) * weight_kv_cache  (weight: 2)
    score += prefix_scorer.Score(endpoint, request) * weight_prefix (weight: 3)

  selected = max_score_picker.Pick(scores)
  route request → selected endpoint

PD profile handler:
  New request (no KV state) → route to prefill pool
  Continuation (has KV)     → route to decode pool

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment