Implementation:Kserve Kserve VLLM Health Endpoint
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Health_Monitoring, LLM_Serving, Kubernetes |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete Kubernetes probe configuration for monitoring vLLM engine health in LLMInferenceService deployments.
Description
The vLLM engine exposes a /health endpoint on port 8000 (HTTPS). KServe configures liveness probes in the LLMInferenceService pod template to detect when the model is loaded and the engine is ready. The probe configuration is defined in the sample YAML and the default pod template ConfigMap.
Usage
Include liveness probe configuration in the LLMInferenceService pod template. The default templates in config/llmisvcconfig/ already include appropriate probe settings.
Code Reference
Source Location
- Repository: kserve
- File: docs/samples/llmisvc/single-node-gpu/llm-inference-service-qwen2-7b-gpu.yaml, Lines 27-35
- File: config/llmisvcconfig/config-llm-template.yaml (default pod template)
Signature
livenessProbe:
httpGet:
path: /health
port: 8000
scheme: HTTPS
initialDelaySeconds: 120
periodSeconds: 30
timeoutSeconds: 30
failureThreshold: 5
Import
# External dependency: vLLM engine provides /health endpoint
# No code import needed — configured via YAML
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| httpGet.path | string | Yes | /health endpoint path |
| httpGet.port | int | Yes | 8000 (vLLM default) |
| httpGet.scheme | string | Yes | HTTPS (vLLM TLS) |
| initialDelaySeconds | int | Yes | Wait before first probe (120s for 7B, 4800s for 600B+) |
Outputs
| Name | Type | Description |
|---|---|---|
| HTTP 200 | response | Engine running, model loaded |
| HTTP 503 | response | Engine starting, model not yet loaded |
| Pod Ready | condition | Kubernetes marks pod as Ready when probe passes |
Usage Examples
Monitor Model Loading
# Watch pod status during model loading
kubectl get pods -l app.kubernetes.io/component=llminferenceservice-workload -w
# Check liveness probe events
kubectl describe pod <pod-name> | grep -A5 "Liveness"
# View vLLM logs during loading
kubectl logs <pod-name> -c main --follow
# Manual health check
kubectl exec <pod-name> -- curl -k https://localhost:8000/health
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment