Implementation:Kserve Kserve VLLM Health Endpoint

Knowledge Sources	KServe vLLM
Domains	Health_Monitoring, LLM_Serving, Kubernetes
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete Kubernetes probe configuration for monitoring vLLM engine health in LLMInferenceService deployments.

Description

The vLLM engine exposes a /health endpoint on port 8000 (HTTPS). KServe configures liveness probes in the LLMInferenceService pod template to detect when the model is loaded and the engine is ready. The probe configuration is defined in the sample YAML and the default pod template ConfigMap.

Usage

Include liveness probe configuration in the LLMInferenceService pod template. The default templates in config/llmisvcconfig/ already include appropriate probe settings.

Code Reference

Source Location

Repository: kserve
File: docs/samples/llmisvc/single-node-gpu/llm-inference-service-qwen2-7b-gpu.yaml, Lines 27-35
File: config/llmisvcconfig/config-llm-template.yaml (default pod template)

Signature

livenessProbe:
  httpGet:
    path: /health
    port: 8000
    scheme: HTTPS
  initialDelaySeconds: 120
  periodSeconds: 30
  timeoutSeconds: 30
  failureThreshold: 5

Import

# External dependency: vLLM engine provides /health endpoint
# No code import needed — configured via YAML

I/O Contract

Inputs

Name	Type	Required	Description
httpGet.path	string	Yes	/health endpoint path
httpGet.port	int	Yes	8000 (vLLM default)
httpGet.scheme	string	Yes	HTTPS (vLLM TLS)
initialDelaySeconds	int	Yes	Wait before first probe (120s for 7B, 4800s for 600B+)

Outputs

Name	Type	Description
HTTP 200	response	Engine running, model loaded
HTTP 503	response	Engine starting, model not yet loaded
Pod Ready	condition	Kubernetes marks pod as Ready when probe passes

Usage Examples

Monitor Model Loading

# Watch pod status during model loading
kubectl get pods -l app.kubernetes.io/component=llminferenceservice-workload -w

# Check liveness probe events
kubectl describe pod <pod-name> | grep -A5 "Liveness"

# View vLLM logs during loading
kubectl logs <pod-name> -c main --follow

# Manual health check
kubectl exec <pod-name> -- curl -k https://localhost:8000/health

Related Pages

Implements Principle

Principle:Kserve_Kserve_VLLM_Health_Monitoring

Requires Environment

Uses Heuristic

Heuristic:Kserve_Kserve_VLLM_GPU_Memory_Utilization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment