Implementation:Sgl project Sglang GPU Trace To Graph
| Knowledge Sources | |
|---|---|
| Domains | Profiling, GPU Performance, Visualization |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
A Python tool that processes NVIDIA Nsight Systems GPU trace files (.nsys-rep) and generates kernel-level performance summaries with stacked bar chart visualizations of GPU vs non-GPU time.
Description
gputrc2graph.py provides the GPUTrace2Graph class that implements a complete profiling analysis pipeline:
Step 1 -- Trace Extraction: Calls nsys stats -r cuda_gpu_trace to extract a CSV of all CUDA GPU kernel invocations from the .nsys-rep file. Estimates processing time based on file size (calibrated at 240 MB/min).
Step 2 -- Non-Overlapped Time Computation: The sum_non_overlapping_intervals method sorts all kernel intervals by start time and computes the actual wall-clock (non-overlapped) elapsed time for each kernel. Overlapping intervals are clipped to avoid double-counting concurrent GPU operations.
Step 3 -- Kernel Classification: The anno_gpu_kernname method annotates each kernel with a category (e.g., gemm, attention, MoE, NCCL) using regex patterns loaded from JSON configuration files via load_engine_model. Patterns are organized by engine (sglang, vllm, etc.) and model (llama, gpt-oss, etc.).
Step 4 -- Visualization: The make_html method generates:
- A Plotly stacked bar chart (result.html) showing elapsed time per category per model/engine
- A CSV file (result.csv) mapping kernels to categories with elapsed times
- A pivot table appended to the HTML showing time per category across model/engine configurations
The tool also calculates non-GPU (CPU) time by subtracting total GPU elapsed time from the profiled elapsed time.
Usage
Run as a command-line tool, providing one or more .nsys-rep files with associated engine/model/timing metadata. Used by developers to identify GPU kernel time bottlenecks and compare performance across different engines and models.
Code Reference
Source Location
- Repository: Sgl_project_Sglang
- File: examples/profiler/nsys_profile_tools/gputrc2graph.py
- Lines: 1-346
Signature
def load_engine_model() -> dict
class GPUTrace2Graph:
def __init__(self)
def gen_nonoverlapped_sum_from_gputrace(self, in_file: str, out_file: str)
def sum_non_overlapping_intervals(self, df: pd.DataFrame) -> pd.DataFrame
def make_html(self, df: pd.DataFrame, output_dir: str, title: str)
def anno_gpu_kernname(self, df: pd.DataFrame, mapping: dict)
def make_nongpu_row(self, df: pd.DataFrame, nongpu_sec: float) -> pd.DataFrame
def is_valid_file(self, base_file: str)
def should_gen_file(self, new_file: str, base_file: str) -> bool
def gen_sum_file(self, file: str, nsys_cmd: str) -> str
def gen_graph(self, in_file: list, out_dir: str, title: str, nsys_cmd: str, engine_model: dict)
def parse_tuple(s: str) -> tuple
def main()
Import
import argparse
import logging
import os
import shlex
import regex as re
# Lazy imports:
# import pandas as pd
# import plotly.express as px
# import glob, json, subprocess
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --in_file | tuple(s) | Yes | List of (nsys-rep,engine,model,elapsed_sec) tuples separated by space |
| --out_dir | string | No | Output directory for result.csv and result.html |
| --title | string | No | Title for the HTML chart |
| --nsys_cmd | string | No | Path to nsys binary (default: "nsys") |
| *.json config files | JSON | Yes | Kernel category regex patterns in the same directory as the script |
Outputs
| Name | Type | Description |
|---|---|---|
| result.html | HTML file | Plotly stacked bar chart and pivot table showing GPU time per category |
| result.csv | CSV file | Kernel-to-category mapping with elapsed time and instance counts |
| *_cuda_gpu_trace.csv | CSV file | Raw GPU trace extracted by nsys stats |
| *_cuda_gpu_kernel_tracesum.csv | CSV file | Per-kernel non-overlapped time summary |
Usage Examples
Basic Single Profile Analysis
python gputrc2graph.py \
--in_file d1.nsys-rep,sglang,llama,100 \
--out_dir results/ \
--title "Llama 70B on SGLang"
Multi-Profile Comparison
python gputrc2graph.py \
--in_file d1.nsys-rep,sglang,llama,100 d2.nsys-rep,sglang,gpt-oss,102 \
--out_dir results/ \
--title "Model=gpt-oss SGLANG chart"
Custom nsys Path
python gputrc2graph.py \
--in_file trace.nsys-rep,sglang,llama,50 \
--nsys_cmd /usr/local/bin/nsys