Principle:Sgl project Sglang Server Health Monitoring
| Knowledge Sources | |
|---|---|
| Domains | LLM_Serving, Monitoring, Operations |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
An operational monitoring pattern that exposes health check, server information, and Prometheus metrics endpoints for production observability.
Description
Server health monitoring provides production observability for deployed LLM servers. SGLang exposes three categories of endpoints: health checks (/health) for load balancers and orchestrators, server information (/server_info) for debugging and configuration verification, and Prometheus metrics (/metrics) for time-series monitoring of throughput, latency, token counts, and resource utilization.
Usage
Use health monitoring endpoints for production deployments. Configure load balancers to poll /health, connect Prometheus to /metrics, and use /server_info for operational debugging.
Theoretical Basis
Production monitoring follows the three pillars of observability:
- Health checks — Binary healthy/unhealthy signal for load balancers
- Structured information — JSON metadata for debugging (model info, version, internal state)
- Metrics — Time-series data for dashboards and alerting (Prometheus format)
Key health states:
- 200 OK — Server is healthy and accepting requests
- 503 Service Unavailable — Server is starting up or shutting down