Implementation:Triton inference server Server Model Analyzer Analyze

Field	Value
Page Type	Implementation
Title	Model_Analyzer_Analyze
Namespace	Triton_inference_server_Server
Domains	Performance, Model_Serving, Optimization
External Dependencies	triton-model-analyzer pip package, wkhtmltopdf (for PDF report generation)
Last Updated	2026-02-13 17:00 GMT

Overview

Concrete CLI for analyzing profiling results and ranking model configurations. The model-analyzer analyze command processes checkpoint data from the profiling step, applies user-specified constraints, and produces a ranked table of configuration candidates along with optional PDF/HTML reports.

Description

The analyze subcommand reads profiling checkpoint data and performs constraint-based filtering and ranking. It evaluates each profiled configuration against the specified constraints (latency budget, GPU memory limit, minimum throughput), eliminates non-compliant configurations, and ranks the remaining ones by throughput.

The output includes:

A ranked table showing the top N configurations with their key metrics
Configuration details including max batch size, dynamic batching settings, instance count, p99 latency, throughput, and GPU memory usage
Optional PDF and HTML reports with detailed comparisons and visualizations

The analyze command can be run multiple times with different constraints on the same checkpoint data, enabling rapid exploration of trade-off scenarios without re-profiling.

Usage

CLI Signature

model-analyzer analyze \
  --analysis-models=<model_name> \
  [--checkpoint-directory=<path>] \
  [--export-path=<path>] \
  [--top-n-configs=<int>] \
  [--latency-budget=<ms>] \
  [--min-throughput=<infer/sec>] \
  [--max-gpu-memory=<MB>]

Key Parameters

Parameter	Description	Default
`--analysis-models`	Model name(s) to analyze from checkpoint data	(required)
`--checkpoint-directory`	Path to directory containing profiling checkpoints	./checkpoints
`--export-path`	Path to export analysis reports (PDF/HTML)	./export
`--top-n-configs`	Number of top configurations to display	3
`--latency-budget`	Maximum acceptable p99 latency in milliseconds	None (no constraint)
`--min-throughput`	Minimum acceptable throughput in inferences per second	None (no constraint)
`--max-gpu-memory`	Maximum GPU memory usage in megabytes	None (no constraint)

Code Reference

Source Location

docs/user_guide/performance_tuning.md:L332-393 -- Model Analyzer analyze command documentation, output format, and constraint usage

Import / Installation

# Install Model Analyzer via pip (if not already installed during profiling)
pip install triton-model-analyzer

# Install wkhtmltopdf for PDF report generation (optional)
apt-get install -y wkhtmltopdf

I/O Contract

Inputs

Input	Type	Required	Description
Analysis model name	String	Yes	Name of the model to analyze (must have been profiled previously)
Checkpoint directory	Directory path	No	Path to profiling checkpoint data (defaults to ./checkpoints)
Latency budget	Integer (ms)	No	Maximum p99 latency constraint in milliseconds
Min throughput	Integer (infer/sec)	No	Minimum throughput constraint in inferences per second
Max GPU memory	Integer (MB)	No	Maximum GPU memory constraint in megabytes

Outputs

Output	Type	Description
Ranked config table	Text (stdout)	Table with columns: Config Name, Max Batch Size, Dynamic Batching, Instance Count, p99 Latency (us), Throughput (infer/sec), GPU Memory (MB)
PDF report	File (PDF)	Detailed comparative report with charts (requires wkhtmltopdf)
HTML report	File (HTML)	Detailed comparative report with charts
Export directory	Directory	Contains all generated reports and summary files

Usage Examples

Example 1: Basic analysis with default settings

Analyze profiling results for densenet_onnx and show the top 3 configurations:

model-analyzer analyze --analysis-models=densenet_onnx

Expected output:

+-----------------------------+--------------------+---------------------+------------------+------------------+--------------------+--------------+
| Config Name                 | Max Batch Size     | Dynamic Batching    | Instance Count   | p99 Latency (us) | Throughput (inf/s) | GPU Mem (MB) |
+-----------------------------+--------------------+---------------------+------------------+------------------+--------------------+--------------+
| densenet_onnx_config_3      | 8                  | preferred=[4,8]     | 2                | 8234             | 892.4              | 1248         |
| densenet_onnx_config_7      | 16                 | preferred=[8,16]    | 2                | 11502            | 875.1              | 1456         |
| densenet_onnx_config_1      | 4                  | preferred=[2,4]     | 3                | 6872             | 843.7              | 1680         |
+-----------------------------+--------------------+---------------------+------------------+------------------+--------------------+--------------+

Example 2: Analysis with latency constraint

Find the best configuration that meets a 10ms (10000 usec) p99 latency budget:

model-analyzer analyze \
  --analysis-models=densenet_onnx \
  --latency-budget=10 \
  --top-n-configs=5

Example 3: Analysis with multiple constraints

Apply both latency and memory constraints:

model-analyzer analyze \
  --analysis-models=densenet_onnx \
  --latency-budget=15 \
  --max-gpu-memory=1500 \
  --min-throughput=500 \
  --top-n-configs=3 \
  --export-path=./analysis_reports

Example 4: Analysis with custom checkpoint directory

Analyze results from a specific checkpoint directory:

model-analyzer analyze \
  --analysis-models=densenet_onnx \
  --checkpoint-directory=/data/profiling_checkpoints \
  --export-path=./reports \
  --top-n-configs=10

Related Pages

Implements: Principle: Performance_Analysis -- implements::Principle:Triton_inference_server_Server_Performance_Analysis

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment