Principle:Sgl project Sglang Server Health Monitoring

Knowledge Sources	SGLang Prometheus
Domains	LLM_Serving, Monitoring, Operations
Last Updated	2026-02-10 00:00 GMT

Overview

An operational monitoring pattern that exposes health check, server information, and Prometheus metrics endpoints for production observability.

Description

Server health monitoring provides production observability for deployed LLM servers. SGLang exposes three categories of endpoints: health checks (/health) for load balancers and orchestrators, server information (/server_info) for debugging and configuration verification, and Prometheus metrics (/metrics) for time-series monitoring of throughput, latency, token counts, and resource utilization.

Usage

Use health monitoring endpoints for production deployments. Configure load balancers to poll /health, connect Prometheus to /metrics, and use /server_info for operational debugging.

Theoretical Basis

Production monitoring follows the three pillars of observability:

Health checks — Binary healthy/unhealthy signal for load balancers
Structured information — JSON metadata for debugging (model info, version, internal state)
Metrics — Time-series data for dashboards and alerting (Prometheus format)

Key health states:

200 OK — Server is healthy and accepting requests
503 Service Unavailable — Server is starting up or shutting down

Related Pages

Implemented By

Implementation:Sgl_project_Sglang_Health_And_Metrics_Endpoints

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment