Implementation:Triton inference server Server TraceSummary
| Knowledge Sources | |
|---|---|
| Domains | Testing, Tracing |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Parses and summarizes Triton trace output files, computing latency statistics for each stage of the inference pipeline.
Description
The `trace_summary.py` script reads JSON trace files generated by Triton's `--trace-file` option and computes summary statistics (min, max, average, percentiles) for each traced activity including request handling, queue time, compute input, compute execution, and compute output. It groups trace spans by model name and produces a human-readable summary report. The script is used by QA tests that validate tracing functionality and by developers analyzing inference performance characteristics.
Usage
Run this script after collecting trace output from a Triton server session to generate a latency summary. Pass the trace file path as an argument and optionally filter by model name or adjust display options.
Code Reference
Source Location
- Repository: Triton Inference Server
- File: qa/common/trace_summary.py
- Lines: 1-507
Signature
def parse_trace_file(trace_file):
"""Parse a Triton JSON trace file into structured trace records."""
def summarize_traces(traces, show_all=False):
"""Compute min/max/avg/p50/p90/p95/p99 latency per activity."""
def main():
"""Entry point: parse args, read trace file, output summary."""
Import
# Typically run as a standalone script
python qa/common/trace_summary.py trace_output.json
# Or imported as a module
import trace_summary
traces = trace_summary.parse_trace_file("trace_output.json")
summary = trace_summary.summarize_traces(traces)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| trace_file | string | Yes | Path to the Triton JSON trace output file |
| show_all | bool | No | Show all individual trace records in addition to summary |
| model_name | string | No | Filter traces to a specific model name |
Outputs
| Name | Type | Description |
|---|---|---|
| summary_report | stdout | Formatted table of latency statistics per pipeline stage |
| traces | list[dict] | Parsed trace records when used as a library (programmatic access) |
Usage Examples
Summarize a Trace File
python qa/common/trace_summary.py /tmp/triton_trace.json
Programmatic Usage in Tests
import trace_summary
traces = trace_summary.parse_trace_file("/tmp/triton_trace.json")
summary = trace_summary.summarize_traces(traces)
assert summary["COMPUTE_INFER"]["avg"] < 10000 # avg < 10ms