Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Sgl project Sglang GPU Trace To Graph

From Leeroopedia


Knowledge Sources
Domains Profiling, GPU Performance, Visualization
Last Updated 2026-02-10 00:00 GMT

Overview

A Python tool that processes NVIDIA Nsight Systems GPU trace files (.nsys-rep) and generates kernel-level performance summaries with stacked bar chart visualizations of GPU vs non-GPU time.

Description

gputrc2graph.py provides the GPUTrace2Graph class that implements a complete profiling analysis pipeline:

Step 1 -- Trace Extraction: Calls nsys stats -r cuda_gpu_trace to extract a CSV of all CUDA GPU kernel invocations from the .nsys-rep file. Estimates processing time based on file size (calibrated at 240 MB/min).

Step 2 -- Non-Overlapped Time Computation: The sum_non_overlapping_intervals method sorts all kernel intervals by start time and computes the actual wall-clock (non-overlapped) elapsed time for each kernel. Overlapping intervals are clipped to avoid double-counting concurrent GPU operations.

Step 3 -- Kernel Classification: The anno_gpu_kernname method annotates each kernel with a category (e.g., gemm, attention, MoE, NCCL) using regex patterns loaded from JSON configuration files via load_engine_model. Patterns are organized by engine (sglang, vllm, etc.) and model (llama, gpt-oss, etc.).

Step 4 -- Visualization: The make_html method generates:

  • A Plotly stacked bar chart (result.html) showing elapsed time per category per model/engine
  • A CSV file (result.csv) mapping kernels to categories with elapsed times
  • A pivot table appended to the HTML showing time per category across model/engine configurations

The tool also calculates non-GPU (CPU) time by subtracting total GPU elapsed time from the profiled elapsed time.

Usage

Run as a command-line tool, providing one or more .nsys-rep files with associated engine/model/timing metadata. Used by developers to identify GPU kernel time bottlenecks and compare performance across different engines and models.

Code Reference

Source Location

Signature

def load_engine_model() -> dict

class GPUTrace2Graph:
    def __init__(self)
    def gen_nonoverlapped_sum_from_gputrace(self, in_file: str, out_file: str)
    def sum_non_overlapping_intervals(self, df: pd.DataFrame) -> pd.DataFrame
    def make_html(self, df: pd.DataFrame, output_dir: str, title: str)
    def anno_gpu_kernname(self, df: pd.DataFrame, mapping: dict)
    def make_nongpu_row(self, df: pd.DataFrame, nongpu_sec: float) -> pd.DataFrame
    def is_valid_file(self, base_file: str)
    def should_gen_file(self, new_file: str, base_file: str) -> bool
    def gen_sum_file(self, file: str, nsys_cmd: str) -> str
    def gen_graph(self, in_file: list, out_dir: str, title: str, nsys_cmd: str, engine_model: dict)

def parse_tuple(s: str) -> tuple
def main()

Import

import argparse
import logging
import os
import shlex
import regex as re
# Lazy imports:
# import pandas as pd
# import plotly.express as px
# import glob, json, subprocess

I/O Contract

Inputs

Name Type Required Description
--in_file tuple(s) Yes List of (nsys-rep,engine,model,elapsed_sec) tuples separated by space
--out_dir string No Output directory for result.csv and result.html
--title string No Title for the HTML chart
--nsys_cmd string No Path to nsys binary (default: "nsys")
*.json config files JSON Yes Kernel category regex patterns in the same directory as the script

Outputs

Name Type Description
result.html HTML file Plotly stacked bar chart and pivot table showing GPU time per category
result.csv CSV file Kernel-to-category mapping with elapsed time and instance counts
*_cuda_gpu_trace.csv CSV file Raw GPU trace extracted by nsys stats
*_cuda_gpu_kernel_tracesum.csv CSV file Per-kernel non-overlapped time summary

Usage Examples

Basic Single Profile Analysis

python gputrc2graph.py \
    --in_file d1.nsys-rep,sglang,llama,100 \
    --out_dir results/ \
    --title "Llama 70B on SGLang"

Multi-Profile Comparison

python gputrc2graph.py \
    --in_file d1.nsys-rep,sglang,llama,100 d2.nsys-rep,sglang,gpt-oss,102 \
    --out_dir results/ \
    --title "Model=gpt-oss SGLANG chart"

Custom nsys Path

python gputrc2graph.py \
    --in_file trace.nsys-rep,sglang,llama,50 \
    --nsys_cmd /usr/local/bin/nsys

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment