Implementation:Triton inference server Server Model Analyzer Analyze
| Field | Value |
|---|---|
| Page Type | Implementation |
| Title | Model_Analyzer_Analyze |
| Namespace | Triton_inference_server_Server |
| Domains | Performance, Model_Serving, Optimization |
| External Dependencies | triton-model-analyzer pip package, wkhtmltopdf (for PDF report generation) |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Concrete CLI for analyzing profiling results and ranking model configurations. The model-analyzer analyze command processes checkpoint data from the profiling step, applies user-specified constraints, and produces a ranked table of configuration candidates along with optional PDF/HTML reports.
Description
The analyze subcommand reads profiling checkpoint data and performs constraint-based filtering and ranking. It evaluates each profiled configuration against the specified constraints (latency budget, GPU memory limit, minimum throughput), eliminates non-compliant configurations, and ranks the remaining ones by throughput.
The output includes:
- A ranked table showing the top N configurations with their key metrics
- Configuration details including max batch size, dynamic batching settings, instance count, p99 latency, throughput, and GPU memory usage
- Optional PDF and HTML reports with detailed comparisons and visualizations
The analyze command can be run multiple times with different constraints on the same checkpoint data, enabling rapid exploration of trade-off scenarios without re-profiling.
Usage
CLI Signature
model-analyzer analyze \
--analysis-models=<model_name> \
[--checkpoint-directory=<path>] \
[--export-path=<path>] \
[--top-n-configs=<int>] \
[--latency-budget=<ms>] \
[--min-throughput=<infer/sec>] \
[--max-gpu-memory=<MB>]
Key Parameters
| Parameter | Description | Default |
|---|---|---|
--analysis-models |
Model name(s) to analyze from checkpoint data | (required) |
--checkpoint-directory |
Path to directory containing profiling checkpoints | ./checkpoints |
--export-path |
Path to export analysis reports (PDF/HTML) | ./export |
--top-n-configs |
Number of top configurations to display | 3 |
--latency-budget |
Maximum acceptable p99 latency in milliseconds | None (no constraint) |
--min-throughput |
Minimum acceptable throughput in inferences per second | None (no constraint) |
--max-gpu-memory |
Maximum GPU memory usage in megabytes | None (no constraint) |
Code Reference
Source Location
docs/user_guide/performance_tuning.md:L332-393-- Model Analyzer analyze command documentation, output format, and constraint usage
Import / Installation
# Install Model Analyzer via pip (if not already installed during profiling)
pip install triton-model-analyzer
# Install wkhtmltopdf for PDF report generation (optional)
apt-get install -y wkhtmltopdf
I/O Contract
Inputs
| Input | Type | Required | Description |
|---|---|---|---|
| Analysis model name | String | Yes | Name of the model to analyze (must have been profiled previously) |
| Checkpoint directory | Directory path | No | Path to profiling checkpoint data (defaults to ./checkpoints) |
| Latency budget | Integer (ms) | No | Maximum p99 latency constraint in milliseconds |
| Min throughput | Integer (infer/sec) | No | Minimum throughput constraint in inferences per second |
| Max GPU memory | Integer (MB) | No | Maximum GPU memory constraint in megabytes |
Outputs
| Output | Type | Description |
|---|---|---|
| Ranked config table | Text (stdout) | Table with columns: Config Name, Max Batch Size, Dynamic Batching, Instance Count, p99 Latency (us), Throughput (infer/sec), GPU Memory (MB) |
| PDF report | File (PDF) | Detailed comparative report with charts (requires wkhtmltopdf) |
| HTML report | File (HTML) | Detailed comparative report with charts |
| Export directory | Directory | Contains all generated reports and summary files |
Usage Examples
Example 1: Basic analysis with default settings
Analyze profiling results for densenet_onnx and show the top 3 configurations:
model-analyzer analyze --analysis-models=densenet_onnx
Expected output:
+-----------------------------+--------------------+---------------------+------------------+------------------+--------------------+--------------+
| Config Name | Max Batch Size | Dynamic Batching | Instance Count | p99 Latency (us) | Throughput (inf/s) | GPU Mem (MB) |
+-----------------------------+--------------------+---------------------+------------------+------------------+--------------------+--------------+
| densenet_onnx_config_3 | 8 | preferred=[4,8] | 2 | 8234 | 892.4 | 1248 |
| densenet_onnx_config_7 | 16 | preferred=[8,16] | 2 | 11502 | 875.1 | 1456 |
| densenet_onnx_config_1 | 4 | preferred=[2,4] | 3 | 6872 | 843.7 | 1680 |
+-----------------------------+--------------------+---------------------+------------------+------------------+--------------------+--------------+
Example 2: Analysis with latency constraint
Find the best configuration that meets a 10ms (10000 usec) p99 latency budget:
model-analyzer analyze \
--analysis-models=densenet_onnx \
--latency-budget=10 \
--top-n-configs=5
Example 3: Analysis with multiple constraints
Apply both latency and memory constraints:
model-analyzer analyze \
--analysis-models=densenet_onnx \
--latency-budget=15 \
--max-gpu-memory=1500 \
--min-throughput=500 \
--top-n-configs=3 \
--export-path=./analysis_reports
Example 4: Analysis with custom checkpoint directory
Analyze results from a specific checkpoint directory:
model-analyzer analyze \
--analysis-models=densenet_onnx \
--checkpoint-directory=/data/profiling_checkpoints \
--export-path=./reports \
--top-n-configs=10
Related Pages
- Implements: Principle: Performance_Analysis -- implements::Principle:Triton_inference_server_Server_Performance_Analysis