Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Sgl project Sglang Grafana Dashboard Config

From Leeroopedia


Knowledge Sources
Domains Monitoring, Observability, Grafana
Last Updated 2026-02-10 00:00 GMT

Overview

A Grafana dashboard JSON definition that provides comprehensive monitoring panels for SGLang server metrics, using Prometheus as the data source with PromQL queries against sglang_* metrics.

Description

sglang-dashboard.json defines a Grafana dashboard with 8 monitoring panels organized in a grid layout. The dashboard provides real-time visibility into all critical SGLang performance metrics and is designed for production monitoring.

Dashboard Panels:

  1. End-to-End Request Latency -- Timeseries chart showing P50, P90, P99, and average latency using histogram_quantile over sglang_e2e_request_latency_seconds_bucket
  2. E2E Latency Heatmap -- Heatmap visualization of the latency distribution
  3. Time-To-First-Token Latency -- P50, P90, P99, and average TTFT from sglang_time_to_first_token_seconds_bucket
  4. TTFT Heatmap -- Heatmap visualization of TTFT distribution
  5. Num Running Requests -- Gauge showing sglang_num_running_reqs
  6. Token Generation Throughput -- Rate of sglang_token_usage_total in tokens/sec
  7. Cache Hit Rate -- Rate metric from sglang_cache_hit_rate
  8. Number Queued Requests -- Gauge showing sglang_num_queue_reqs

Template Variables:

  • instance: Filters by Prometheus instance label (multi-value, includes "All" option)
  • model_name: Filters by model name label (multi-value, includes "All" option)

Configuration:

  • Auto-refresh every 5 seconds
  • Default time range: last 30 minutes
  • Datasource: Prometheus (using the prometheus UID)
  • Dashboard UID: sglang-dashboard
  • Panel layout: 12-column grid with responsive sizing

Usage

Import this JSON file into Grafana via the dashboard import feature or provision it using Grafana's file-based provisioning system. The SGLang server must be configured to export Prometheus metrics, and a Prometheus instance must be scraping the SGLang metrics endpoint.

Code Reference

Source Location

Schema Structure

{
    "annotations": { "list": [...] },
    "editable": true,
    "fiscalYearStartMonth": 0,
    "graphTooltip": 0,
    "panels": [
        {
            "datasource": { "type": "prometheus", "uid": "prometheus" },
            "fieldConfig": { ... },
            "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 },
            "targets": [
                {
                    "expr": "histogram_quantile(0.50, sum(rate(sglang_e2e_request_latency_seconds_bucket{...}[1m])) by (le))",
                    "legendFormat": "P50"
                }
            ],
            "title": "End-to-End Request Latency",
            "type": "timeseries"
        }
    ],
    "templating": {
        "list": [
            { "name": "instance", "type": "query" },
            { "name": "model_name", "type": "query" }
        ]
    },
    "time": { "from": "now-30m", "to": "now" },
    "refresh": "5s",
    "uid": "sglang-dashboard"
}

Import

N/A -- This is a Grafana dashboard JSON file imported via the Grafana UI or provisioning API.

I/O Contract

Inputs

Name Type Required Description
sglang_e2e_request_latency_seconds_bucket Prometheus histogram Yes End-to-end request latency distribution
sglang_time_to_first_token_seconds_bucket Prometheus histogram Yes Time-to-first-token latency distribution
sglang_num_running_reqs Prometheus gauge Yes Current count of running requests
sglang_token_usage_total Prometheus counter Yes Total tokens generated (used for throughput rate)
sglang_cache_hit_rate Prometheus gauge Yes Radix cache hit rate
sglang_num_queue_reqs Prometheus gauge Yes Current count of queued requests
instance template variable No Prometheus instance label filter
model_name template variable No Model name label filter

Outputs

Name Type Description
Dashboard panels Grafana visualizations 8 panels showing latency, throughput, cache, and queue metrics
Template dropdowns UI elements Filters for instance and model name

Usage Examples

Import via Grafana CLI

# Copy to Grafana provisioning directory
cp sglang-dashboard.json /etc/grafana/provisioning/dashboards/

# Or import via Grafana API
curl -X POST \
  -H "Content-Type: application/json" \
  -d @sglang-dashboard.json \
  http://admin:admin@localhost:3000/api/dashboards/db

PromQL Query for P99 E2E Latency

histogram_quantile(
    0.99,
    sum(
        rate(sglang_e2e_request_latency_seconds_bucket{
            instance=~"$instance",
            model_name=~"$model_name"
        }[1m])
    ) by (le)
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment