Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server Model Analyzer Analyze

From Leeroopedia
Field Value
Page Type Implementation
Title Model_Analyzer_Analyze
Namespace Triton_inference_server_Server
Domains Performance, Model_Serving, Optimization
External Dependencies triton-model-analyzer pip package, wkhtmltopdf (for PDF report generation)
Last Updated 2026-02-13 17:00 GMT

Overview

Concrete CLI for analyzing profiling results and ranking model configurations. The model-analyzer analyze command processes checkpoint data from the profiling step, applies user-specified constraints, and produces a ranked table of configuration candidates along with optional PDF/HTML reports.

Description

The analyze subcommand reads profiling checkpoint data and performs constraint-based filtering and ranking. It evaluates each profiled configuration against the specified constraints (latency budget, GPU memory limit, minimum throughput), eliminates non-compliant configurations, and ranks the remaining ones by throughput.

The output includes:

  • A ranked table showing the top N configurations with their key metrics
  • Configuration details including max batch size, dynamic batching settings, instance count, p99 latency, throughput, and GPU memory usage
  • Optional PDF and HTML reports with detailed comparisons and visualizations

The analyze command can be run multiple times with different constraints on the same checkpoint data, enabling rapid exploration of trade-off scenarios without re-profiling.

Usage

CLI Signature

model-analyzer analyze \
  --analysis-models=<model_name> \
  [--checkpoint-directory=<path>] \
  [--export-path=<path>] \
  [--top-n-configs=<int>] \
  [--latency-budget=<ms>] \
  [--min-throughput=<infer/sec>] \
  [--max-gpu-memory=<MB>]

Key Parameters

Parameter Description Default
--analysis-models Model name(s) to analyze from checkpoint data (required)
--checkpoint-directory Path to directory containing profiling checkpoints ./checkpoints
--export-path Path to export analysis reports (PDF/HTML) ./export
--top-n-configs Number of top configurations to display 3
--latency-budget Maximum acceptable p99 latency in milliseconds None (no constraint)
--min-throughput Minimum acceptable throughput in inferences per second None (no constraint)
--max-gpu-memory Maximum GPU memory usage in megabytes None (no constraint)

Code Reference

Source Location

  • docs/user_guide/performance_tuning.md:L332-393 -- Model Analyzer analyze command documentation, output format, and constraint usage

Import / Installation

# Install Model Analyzer via pip (if not already installed during profiling)
pip install triton-model-analyzer

# Install wkhtmltopdf for PDF report generation (optional)
apt-get install -y wkhtmltopdf

I/O Contract

Inputs

Input Type Required Description
Analysis model name String Yes Name of the model to analyze (must have been profiled previously)
Checkpoint directory Directory path No Path to profiling checkpoint data (defaults to ./checkpoints)
Latency budget Integer (ms) No Maximum p99 latency constraint in milliseconds
Min throughput Integer (infer/sec) No Minimum throughput constraint in inferences per second
Max GPU memory Integer (MB) No Maximum GPU memory constraint in megabytes

Outputs

Output Type Description
Ranked config table Text (stdout) Table with columns: Config Name, Max Batch Size, Dynamic Batching, Instance Count, p99 Latency (us), Throughput (infer/sec), GPU Memory (MB)
PDF report File (PDF) Detailed comparative report with charts (requires wkhtmltopdf)
HTML report File (HTML) Detailed comparative report with charts
Export directory Directory Contains all generated reports and summary files

Usage Examples

Example 1: Basic analysis with default settings

Analyze profiling results for densenet_onnx and show the top 3 configurations:

model-analyzer analyze --analysis-models=densenet_onnx

Expected output:

+-----------------------------+--------------------+---------------------+------------------+------------------+--------------------+--------------+
| Config Name                 | Max Batch Size     | Dynamic Batching    | Instance Count   | p99 Latency (us) | Throughput (inf/s) | GPU Mem (MB) |
+-----------------------------+--------------------+---------------------+------------------+------------------+--------------------+--------------+
| densenet_onnx_config_3      | 8                  | preferred=[4,8]     | 2                | 8234             | 892.4              | 1248         |
| densenet_onnx_config_7      | 16                 | preferred=[8,16]    | 2                | 11502            | 875.1              | 1456         |
| densenet_onnx_config_1      | 4                  | preferred=[2,4]     | 3                | 6872             | 843.7              | 1680         |
+-----------------------------+--------------------+---------------------+------------------+------------------+--------------------+--------------+

Example 2: Analysis with latency constraint

Find the best configuration that meets a 10ms (10000 usec) p99 latency budget:

model-analyzer analyze \
  --analysis-models=densenet_onnx \
  --latency-budget=10 \
  --top-n-configs=5

Example 3: Analysis with multiple constraints

Apply both latency and memory constraints:

model-analyzer analyze \
  --analysis-models=densenet_onnx \
  --latency-budget=15 \
  --max-gpu-memory=1500 \
  --min-throughput=500 \
  --top-n-configs=3 \
  --export-path=./analysis_reports

Example 4: Analysis with custom checkpoint directory

Analyze results from a specific checkpoint directory:

model-analyzer analyze \
  --analysis-models=densenet_onnx \
  --checkpoint-directory=/data/profiling_checkpoints \
  --export-path=./reports \
  --top-n-configs=10

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment