Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Sgl project Sglang Server Health Monitoring

From Leeroopedia


Knowledge Sources
Domains LLM_Serving, Monitoring, Operations
Last Updated 2026-02-10 00:00 GMT

Overview

An operational monitoring pattern that exposes health check, server information, and Prometheus metrics endpoints for production observability.

Description

Server health monitoring provides production observability for deployed LLM servers. SGLang exposes three categories of endpoints: health checks (/health) for load balancers and orchestrators, server information (/server_info) for debugging and configuration verification, and Prometheus metrics (/metrics) for time-series monitoring of throughput, latency, token counts, and resource utilization.

Usage

Use health monitoring endpoints for production deployments. Configure load balancers to poll /health, connect Prometheus to /metrics, and use /server_info for operational debugging.

Theoretical Basis

Production monitoring follows the three pillars of observability:

  1. Health checks — Binary healthy/unhealthy signal for load balancers
  2. Structured information — JSON metadata for debugging (model info, version, internal state)
  3. Metrics — Time-series data for dashboards and alerting (Prometheus format)

Key health states:

  • 200 OK — Server is healthy and accepting requests
  • 503 Service Unavailable — Server is starting up or shutting down

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment