Implementation:Sgl project Sglang Grafana Dashboard Config
| Knowledge Sources | |
|---|---|
| Domains | Monitoring, Observability, Grafana |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
A Grafana dashboard JSON definition that provides comprehensive monitoring panels for SGLang server metrics, using Prometheus as the data source with PromQL queries against sglang_* metrics.
Description
sglang-dashboard.json defines a Grafana dashboard with 8 monitoring panels organized in a grid layout. The dashboard provides real-time visibility into all critical SGLang performance metrics and is designed for production monitoring.
Dashboard Panels:
- End-to-End Request Latency -- Timeseries chart showing P50, P90, P99, and average latency using
histogram_quantileoversglang_e2e_request_latency_seconds_bucket - E2E Latency Heatmap -- Heatmap visualization of the latency distribution
- Time-To-First-Token Latency -- P50, P90, P99, and average TTFT from
sglang_time_to_first_token_seconds_bucket - TTFT Heatmap -- Heatmap visualization of TTFT distribution
- Num Running Requests -- Gauge showing
sglang_num_running_reqs - Token Generation Throughput -- Rate of
sglang_token_usage_totalin tokens/sec - Cache Hit Rate -- Rate metric from
sglang_cache_hit_rate - Number Queued Requests -- Gauge showing
sglang_num_queue_reqs
Template Variables:
- instance: Filters by Prometheus instance label (multi-value, includes "All" option)
- model_name: Filters by model name label (multi-value, includes "All" option)
Configuration:
- Auto-refresh every 5 seconds
- Default time range: last 30 minutes
- Datasource: Prometheus (using the
prometheusUID) - Dashboard UID:
sglang-dashboard - Panel layout: 12-column grid with responsive sizing
Usage
Import this JSON file into Grafana via the dashboard import feature or provision it using Grafana's file-based provisioning system. The SGLang server must be configured to export Prometheus metrics, and a Prometheus instance must be scraping the SGLang metrics endpoint.
Code Reference
Source Location
- Repository: Sgl_project_Sglang
- File: examples/monitoring/grafana/dashboards/json/sglang-dashboard.json
- Lines: 1-984
Schema Structure
{
"annotations": { "list": [...] },
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"panels": [
{
"datasource": { "type": "prometheus", "uid": "prometheus" },
"fieldConfig": { ... },
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 },
"targets": [
{
"expr": "histogram_quantile(0.50, sum(rate(sglang_e2e_request_latency_seconds_bucket{...}[1m])) by (le))",
"legendFormat": "P50"
}
],
"title": "End-to-End Request Latency",
"type": "timeseries"
}
],
"templating": {
"list": [
{ "name": "instance", "type": "query" },
{ "name": "model_name", "type": "query" }
]
},
"time": { "from": "now-30m", "to": "now" },
"refresh": "5s",
"uid": "sglang-dashboard"
}
Import
N/A -- This is a Grafana dashboard JSON file imported via the Grafana UI or provisioning API.
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| sglang_e2e_request_latency_seconds_bucket | Prometheus histogram | Yes | End-to-end request latency distribution |
| sglang_time_to_first_token_seconds_bucket | Prometheus histogram | Yes | Time-to-first-token latency distribution |
| sglang_num_running_reqs | Prometheus gauge | Yes | Current count of running requests |
| sglang_token_usage_total | Prometheus counter | Yes | Total tokens generated (used for throughput rate) |
| sglang_cache_hit_rate | Prometheus gauge | Yes | Radix cache hit rate |
| sglang_num_queue_reqs | Prometheus gauge | Yes | Current count of queued requests |
| instance | template variable | No | Prometheus instance label filter |
| model_name | template variable | No | Model name label filter |
Outputs
| Name | Type | Description |
|---|---|---|
| Dashboard panels | Grafana visualizations | 8 panels showing latency, throughput, cache, and queue metrics |
| Template dropdowns | UI elements | Filters for instance and model name |
Usage Examples
Import via Grafana CLI
# Copy to Grafana provisioning directory
cp sglang-dashboard.json /etc/grafana/provisioning/dashboards/
# Or import via Grafana API
curl -X POST \
-H "Content-Type: application/json" \
-d @sglang-dashboard.json \
http://admin:admin@localhost:3000/api/dashboards/db
PromQL Query for P99 E2E Latency
histogram_quantile(
0.99,
sum(
rate(sglang_e2e_request_latency_seconds_bucket{
instance=~"$instance",
model_name=~"$model_name"
}[1m])
) by (le)
)