Implementation:Triton inference server Server Model Analyzer Profile

Field	Value
Page Type	Implementation
Title	Model_Analyzer_Profile
Namespace	Triton_inference_server_Server
Domains	Performance, Model_Serving, Optimization
External Dependencies	triton-model-analyzer pip package, wkhtmltopdf (for report generation), perf_analyzer (used internally by Model Analyzer)
Last Updated	2026-02-13 17:00 GMT

Overview

Concrete CLI for automated model configuration profiling using Triton Model Analyzer. This tool automates the process of generating configuration variants, deploying each on Triton, benchmarking with Perf Analyzer, and collecting performance metrics into a structured checkpoint for later analysis.

Description

Model Analyzer's profile subcommand orchestrates the full profiling workflow. It manages the Triton server lifecycle (launching, loading model configs, shutting down), invokes Perf Analyzer for each configuration variant, and stores results in a checkpoint directory.

The tool supports multiple Triton launch modes:

local -- Model Analyzer starts and manages a local Triton process directly
docker -- Model Analyzer launches Triton inside a Docker container
remote -- Model Analyzer connects to an already-running Triton instance
c_api -- Model Analyzer uses the Triton C API for in-process inference (lowest overhead)

During profiling, Model Analyzer generates configuration variants by sweeping across instance count, max batch size, and dynamic batching parameters within user-specified bounds. Each variant is tested at multiple concurrency levels, and the results (throughput, latency, GPU memory) are recorded.

Usage

CLI Signature

model-analyzer profile \
  --model-repository=<path> \
  --profile-models=<model_name> \
  --output-model-repository-path=<results_path> \
  [--triton-launch-mode=<local|docker|remote|c_api>] \
  [--run-config-search-max-concurrency=<int>] \
  [--run-config-search-max-instance-count=<int>] \
  [--run-config-search-max-model-batch-size=<int>] \
  [--checkpoint-directory=<path>] \
  [--override-output-model-repository]

Key Parameters

Parameter	Description	Default
`--model-repository`	Path to the Triton model repository	(required)
`--profile-models`	Comma-separated list of model names to profile	(required)
`--output-model-repository-path`	Path to store generated configuration variants	(required)
`--triton-launch-mode`	How to launch Triton: local, docker, remote, or c_api	local
`--run-config-search-max-concurrency`	Maximum concurrency level to test	1024
`--run-config-search-max-instance-count`	Maximum instance count to sweep	5
`--run-config-search-max-model-batch-size`	Maximum batch size to sweep	128
`--checkpoint-directory`	Directory to store/load profiling checkpoints	./checkpoints
`--override-output-model-repository`	Overwrite existing output model repository	false

Code Reference

Source Location

docs/user_guide/performance_tuning.md:L302-333 -- Model Analyzer profile command documentation and usage

Import / Installation

# Install Model Analyzer via pip
pip install triton-model-analyzer

# Install wkhtmltopdf for PDF report generation (optional)
apt-get install -y wkhtmltopdf

I/O Contract

Inputs

Input	Type	Required	Description
Model repository path	Directory path	Yes	Path to the Triton model repository containing the model(s) to profile
Model names	String (comma-separated)	Yes	Names of models to profile (must exist in the model repository)
Output repository path	Directory path	Yes	Path where generated configuration variants will be stored
Triton server	Service or launchable	Yes	Either a running Triton server (remote mode) or ability to launch one (local/docker/c_api mode)
Search bounds	Integers	No	Maximum values for concurrency, instance count, and batch size sweeps

Outputs

Output	Type	Description
Output model repository	Directory	Contains subdirectories for each configuration variant, each with its own `config.pbtxt`
Profiling checkpoint	Directory	Serialized profiling results for use by `model-analyzer analyze`
Console summary	Text (stdout)	Progress updates and summary of profiled configurations

Usage Examples

Example 1: Basic profiling with local Triton

Profile a model named densenet_onnx using a locally launched Triton server:

model-analyzer profile \
  --model-repository=/models \
  --profile-models=densenet_onnx \
  --output-model-repository-path=./results \
  --triton-launch-mode=local \
  --override-output-model-repository

Example 2: Profiling with bounded search space

Limit the configuration search to reduce profiling time:

model-analyzer profile \
  --model-repository=/models \
  --profile-models=densenet_onnx \
  --output-model-repository-path=./results \
  --triton-launch-mode=local \
  --run-config-search-max-concurrency=16 \
  --run-config-search-max-instance-count=4 \
  --run-config-search-max-model-batch-size=16 \
  --override-output-model-repository

Example 3: Profiling against a remote Triton server

Connect to an already-running Triton instance:

model-analyzer profile \
  --model-repository=/models \
  --profile-models=densenet_onnx \
  --output-model-repository-path=./results \
  --triton-launch-mode=remote \
  --triton-http-endpoint=localhost:8000 \
  --triton-grpc-endpoint=localhost:8001 \
  --triton-metrics-url=http://localhost:8002/metrics \
  --override-output-model-repository

Example 4: Profiling multiple models

Profile multiple models in a single run:

model-analyzer profile \
  --model-repository=/models \
  --profile-models=densenet_onnx,resnet50_onnx \
  --output-model-repository-path=./results \
  --triton-launch-mode=local \
  --override-output-model-repository

Related Pages

Implements: Principle: Automated_Profiling -- implements::Principle:Triton_inference_server_Server_Automated_Profiling

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment