Implementation:Triton inference server Server Model Analyzer Profile
| Field | Value |
|---|---|
| Page Type | Implementation |
| Title | Model_Analyzer_Profile |
| Namespace | Triton_inference_server_Server |
| Domains | Performance, Model_Serving, Optimization |
| External Dependencies | triton-model-analyzer pip package, wkhtmltopdf (for report generation), perf_analyzer (used internally by Model Analyzer) |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Concrete CLI for automated model configuration profiling using Triton Model Analyzer. This tool automates the process of generating configuration variants, deploying each on Triton, benchmarking with Perf Analyzer, and collecting performance metrics into a structured checkpoint for later analysis.
Description
Model Analyzer's profile subcommand orchestrates the full profiling workflow. It manages the Triton server lifecycle (launching, loading model configs, shutting down), invokes Perf Analyzer for each configuration variant, and stores results in a checkpoint directory.
The tool supports multiple Triton launch modes:
- local -- Model Analyzer starts and manages a local Triton process directly
- docker -- Model Analyzer launches Triton inside a Docker container
- remote -- Model Analyzer connects to an already-running Triton instance
- c_api -- Model Analyzer uses the Triton C API for in-process inference (lowest overhead)
During profiling, Model Analyzer generates configuration variants by sweeping across instance count, max batch size, and dynamic batching parameters within user-specified bounds. Each variant is tested at multiple concurrency levels, and the results (throughput, latency, GPU memory) are recorded.
Usage
CLI Signature
model-analyzer profile \
--model-repository=<path> \
--profile-models=<model_name> \
--output-model-repository-path=<results_path> \
[--triton-launch-mode=<local|docker|remote|c_api>] \
[--run-config-search-max-concurrency=<int>] \
[--run-config-search-max-instance-count=<int>] \
[--run-config-search-max-model-batch-size=<int>] \
[--checkpoint-directory=<path>] \
[--override-output-model-repository]
Key Parameters
| Parameter | Description | Default |
|---|---|---|
--model-repository |
Path to the Triton model repository | (required) |
--profile-models |
Comma-separated list of model names to profile | (required) |
--output-model-repository-path |
Path to store generated configuration variants | (required) |
--triton-launch-mode |
How to launch Triton: local, docker, remote, or c_api | local |
--run-config-search-max-concurrency |
Maximum concurrency level to test | 1024 |
--run-config-search-max-instance-count |
Maximum instance count to sweep | 5 |
--run-config-search-max-model-batch-size |
Maximum batch size to sweep | 128 |
--checkpoint-directory |
Directory to store/load profiling checkpoints | ./checkpoints |
--override-output-model-repository |
Overwrite existing output model repository | false |
Code Reference
Source Location
docs/user_guide/performance_tuning.md:L302-333-- Model Analyzer profile command documentation and usage
Import / Installation
# Install Model Analyzer via pip
pip install triton-model-analyzer
# Install wkhtmltopdf for PDF report generation (optional)
apt-get install -y wkhtmltopdf
I/O Contract
Inputs
| Input | Type | Required | Description |
|---|---|---|---|
| Model repository path | Directory path | Yes | Path to the Triton model repository containing the model(s) to profile |
| Model names | String (comma-separated) | Yes | Names of models to profile (must exist in the model repository) |
| Output repository path | Directory path | Yes | Path where generated configuration variants will be stored |
| Triton server | Service or launchable | Yes | Either a running Triton server (remote mode) or ability to launch one (local/docker/c_api mode) |
| Search bounds | Integers | No | Maximum values for concurrency, instance count, and batch size sweeps |
Outputs
| Output | Type | Description |
|---|---|---|
| Output model repository | Directory | Contains subdirectories for each configuration variant, each with its own config.pbtxt
|
| Profiling checkpoint | Directory | Serialized profiling results for use by model-analyzer analyze
|
| Console summary | Text (stdout) | Progress updates and summary of profiled configurations |
Usage Examples
Example 1: Basic profiling with local Triton
Profile a model named densenet_onnx using a locally launched Triton server:
model-analyzer profile \
--model-repository=/models \
--profile-models=densenet_onnx \
--output-model-repository-path=./results \
--triton-launch-mode=local \
--override-output-model-repository
Example 2: Profiling with bounded search space
Limit the configuration search to reduce profiling time:
model-analyzer profile \
--model-repository=/models \
--profile-models=densenet_onnx \
--output-model-repository-path=./results \
--triton-launch-mode=local \
--run-config-search-max-concurrency=16 \
--run-config-search-max-instance-count=4 \
--run-config-search-max-model-batch-size=16 \
--override-output-model-repository
Example 3: Profiling against a remote Triton server
Connect to an already-running Triton instance:
model-analyzer profile \
--model-repository=/models \
--profile-models=densenet_onnx \
--output-model-repository-path=./results \
--triton-launch-mode=remote \
--triton-http-endpoint=localhost:8000 \
--triton-grpc-endpoint=localhost:8001 \
--triton-metrics-url=http://localhost:8002/metrics \
--override-output-model-repository
Example 4: Profiling multiple models
Profile multiple models in a single run:
model-analyzer profile \
--model-repository=/models \
--profile-models=densenet_onnx,resnet50_onnx \
--output-model-repository-path=./results \
--triton-launch-mode=local \
--override-output-model-repository
Related Pages
- Implements: Principle: Automated_Profiling -- implements::Principle:Triton_inference_server_Server_Automated_Profiling