Implementation:Sgl project Sglang Grafana Dashboard Config

Knowledge Sources	Sgl_project_Sglang
Domains	Monitoring, Observability, Grafana
Last Updated	2026-02-10 00:00 GMT

Overview

A Grafana dashboard JSON definition that provides comprehensive monitoring panels for SGLang server metrics, using Prometheus as the data source with PromQL queries against sglang_* metrics.

Description

sglang-dashboard.json defines a Grafana dashboard with 8 monitoring panels organized in a grid layout. The dashboard provides real-time visibility into all critical SGLang performance metrics and is designed for production monitoring.

Dashboard Panels:

End-to-End Request Latency -- Timeseries chart showing P50, P90, P99, and average latency using histogram_quantile over sglang_e2e_request_latency_seconds_bucket
E2E Latency Heatmap -- Heatmap visualization of the latency distribution
Time-To-First-Token Latency -- P50, P90, P99, and average TTFT from sglang_time_to_first_token_seconds_bucket
TTFT Heatmap -- Heatmap visualization of TTFT distribution
Num Running Requests -- Gauge showing sglang_num_running_reqs
Token Generation Throughput -- Rate of sglang_token_usage_total in tokens/sec
Cache Hit Rate -- Rate metric from sglang_cache_hit_rate
Number Queued Requests -- Gauge showing sglang_num_queue_reqs

Template Variables:

instance: Filters by Prometheus instance label (multi-value, includes "All" option)
model_name: Filters by model name label (multi-value, includes "All" option)

Configuration:

Auto-refresh every 5 seconds
Default time range: last 30 minutes
Datasource: Prometheus (using the prometheus UID)
Dashboard UID: sglang-dashboard
Panel layout: 12-column grid with responsive sizing

Usage

Import this JSON file into Grafana via the dashboard import feature or provision it using Grafana's file-based provisioning system. The SGLang server must be configured to export Prometheus metrics, and a Prometheus instance must be scraping the SGLang metrics endpoint.

Code Reference

Source Location

Repository: Sgl_project_Sglang
File: examples/monitoring/grafana/dashboards/json/sglang-dashboard.json
Lines: 1-984

Schema Structure

{
    "annotations": { "list": [...] },
    "editable": true,
    "fiscalYearStartMonth": 0,
    "graphTooltip": 0,
    "panels": [
        {
            "datasource": { "type": "prometheus", "uid": "prometheus" },
            "fieldConfig": { ... },
            "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 },
            "targets": [
                {
                    "expr": "histogram_quantile(0.50, sum(rate(sglang_e2e_request_latency_seconds_bucket{...}[1m])) by (le))",
                    "legendFormat": "P50"
                }
            ],
            "title": "End-to-End Request Latency",
            "type": "timeseries"
        }
    ],
    "templating": {
        "list": [
            { "name": "instance", "type": "query" },
            { "name": "model_name", "type": "query" }
        ]
    },
    "time": { "from": "now-30m", "to": "now" },
    "refresh": "5s",
    "uid": "sglang-dashboard"
}

Import

N/A -- This is a Grafana dashboard JSON file imported via the Grafana UI or provisioning API.

I/O Contract

Inputs

Name	Type	Required	Description
sglang_e2e_request_latency_seconds_bucket	Prometheus histogram	Yes	End-to-end request latency distribution
sglang_time_to_first_token_seconds_bucket	Prometheus histogram	Yes	Time-to-first-token latency distribution
sglang_num_running_reqs	Prometheus gauge	Yes	Current count of running requests
sglang_token_usage_total	Prometheus counter	Yes	Total tokens generated (used for throughput rate)
sglang_cache_hit_rate	Prometheus gauge	Yes	Radix cache hit rate
sglang_num_queue_reqs	Prometheus gauge	Yes	Current count of queued requests
instance	template variable	No	Prometheus instance label filter
model_name	template variable	No	Model name label filter

Outputs

Name	Type	Description
Dashboard panels	Grafana visualizations	8 panels showing latency, throughput, cache, and queue metrics
Template dropdowns	UI elements	Filters for instance and model name

Usage Examples

Import via Grafana CLI

# Copy to Grafana provisioning directory
cp sglang-dashboard.json /etc/grafana/provisioning/dashboards/

# Or import via Grafana API
curl -X POST \
  -H "Content-Type: application/json" \
  -d @sglang-dashboard.json \
  http://admin:admin@localhost:3000/api/dashboards/db

PromQL Query for P99 E2E Latency

histogram_quantile(
    0.99,
    sum(
        rate(sglang_e2e_request_latency_seconds_bucket{
            instance=~"$instance",
            model_name=~"$model_name"
        }[1m])
    ) by (le)
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment