Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Kserve Kserve LLM Scheduler Config

From Leeroopedia
Revision as of 13:09, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Kserve_Kserve_LLM_Scheduler_Config.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Kubernetes, LLM Serving
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete LLMInferenceServiceConfig for the LLM inference scheduler and InferencePool endpoint picker provided by the KServe project.

Description

This file defines the configuration template for the LLM inference scheduler (endpoint picker proxy) and InferencePool that routes requests to model-serving pods based on KV-cache utilization metrics. It specifies an LLMInferenceServiceConfig with a router.scheduler section containing an InferencePool spec with endpoint picker configuration (gRPC on port 9002, failureMode FailOpen) and a scheduler deployment template running llm-d-inference-scheduler:v0.4.0. The scheduler monitors the vllm:kv_cache_usage_perc metric for intelligent, KV-cache-aware load balancing across model serving replicas. This implements the routing principle described in Kserve_Kserve_PD_Scheduler_Routing.

Usage

Apply this configuration as part of the LLM serving setup. The LLMInferenceService controller uses this template to create the inference scheduler and InferencePool resources that handle intelligent request routing. The scheduler integrates with the Gateway API InferencePool mechanism for KV-cache-aware request distribution.

Code Reference

Source Location

Signature

apiVersion: serving.kserve.io/v1alpha2
kind: LLMInferenceServiceConfig
metadata:
  name: kserve-config-llm-scheduler
spec:
  router:
    scheduler:
      pool:
        spec:
          endpointPickerRef:
            failureMode: FailOpen
            kind: Service
            name: |-
              {{ ChildName .ObjectMeta.Name `-epp-service` }}
            port:
              number: 9002
          selector:
            matchLabels:
              app.kubernetes.io/name: |-
                {{ .ObjectMeta.Name }}
              app.kubernetes.io/part-of: llminferenceservice
              kserve.io/component: workload
          targetPorts:
            - number: 8000
      template:
        containers:
          - name: main
            image: ghcr.io/llm-d/llm-d-inference-scheduler:v0.4.0
            ports:
              - containerPort: 9002
                name: grpc
              - containerPort: 9003
                name: grpc-health
              - containerPort: 9090
                name: metrics
              - containerPort: 5557
                name: zmq
            args:
              - --pool-name
              - "{{ ChildName .ObjectMeta.Name `-inference-pool` }}"
              - --pool-namespace
              - "{{ .ObjectMeta.Namespace }}"
              - --kv-cache-usage-percentage-metric
              - "vllm:kv_cache_usage_perc"

Import

kubectl apply -f config/llmisvcconfig/config-llm-scheduler.yaml

I/O Contract

Inputs

Name Type Required Description
.ObjectMeta.Name Go template variable Yes Used to derive InferencePool and EPP service names
.ObjectMeta.Namespace Go template variable Yes Namespace for the inference pool
vllm:kv_cache_usage_perc Prometheus metric Yes KV-cache utilization metric from vLLM pods used for load balancing

Outputs

Name Type Description
LLMInferenceServiceConfig Custom Resource Scheduler and InferencePool template consumed by the LLMIsvc controller
InferencePool Gateway API resource Pool of model-serving endpoints with label-based selector
Scheduler Deployment Deployment Runs the llm-d-inference-scheduler for KV-cache-aware request routing
gRPC endpoint TCP port 9002 Endpoint picker service for the Gateway API
Metrics endpoint TCP port 9090 Prometheus metrics from the scheduler

Usage Examples

Apply the scheduler config

kubectl apply -f config/llmisvcconfig/config-llm-scheduler.yaml

Verify the config is present

kubectl get llminferenceserviceconfig kserve-config-llm-scheduler

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment