Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server TraceSummary

From Leeroopedia
Knowledge Sources
Domains Testing, Tracing
Last Updated 2026-02-13 17:00 GMT

Overview

Parses and summarizes Triton trace output files, computing latency statistics for each stage of the inference pipeline.

Description

The `trace_summary.py` script reads JSON trace files generated by Triton's `--trace-file` option and computes summary statistics (min, max, average, percentiles) for each traced activity including request handling, queue time, compute input, compute execution, and compute output. It groups trace spans by model name and produces a human-readable summary report. The script is used by QA tests that validate tracing functionality and by developers analyzing inference performance characteristics.

Usage

Run this script after collecting trace output from a Triton server session to generate a latency summary. Pass the trace file path as an argument and optionally filter by model name or adjust display options.

Code Reference

Source Location

Signature

def parse_trace_file(trace_file):
    """Parse a Triton JSON trace file into structured trace records."""

def summarize_traces(traces, show_all=False):
    """Compute min/max/avg/p50/p90/p95/p99 latency per activity."""

def main():
    """Entry point: parse args, read trace file, output summary."""

Import

# Typically run as a standalone script
python qa/common/trace_summary.py trace_output.json

# Or imported as a module
import trace_summary
traces = trace_summary.parse_trace_file("trace_output.json")
summary = trace_summary.summarize_traces(traces)

I/O Contract

Inputs

Name Type Required Description
trace_file string Yes Path to the Triton JSON trace output file
show_all bool No Show all individual trace records in addition to summary
model_name string No Filter traces to a specific model name

Outputs

Name Type Description
summary_report stdout Formatted table of latency statistics per pipeline stage
traces list[dict] Parsed trace records when used as a library (programmatic access)

Usage Examples

Summarize a Trace File

python qa/common/trace_summary.py /tmp/triton_trace.json

Programmatic Usage in Tests

import trace_summary
traces = trace_summary.parse_trace_file("/tmp/triton_trace.json")
summary = trace_summary.summarize_traces(traces)
assert summary["COMPUTE_INFER"]["avg"] < 10000  # avg < 10ms

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment