Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server Model Analyzer Profile

From Leeroopedia
Field Value
Page Type Implementation
Title Model_Analyzer_Profile
Namespace Triton_inference_server_Server
Domains Performance, Model_Serving, Optimization
External Dependencies triton-model-analyzer pip package, wkhtmltopdf (for report generation), perf_analyzer (used internally by Model Analyzer)
Last Updated 2026-02-13 17:00 GMT

Overview

Concrete CLI for automated model configuration profiling using Triton Model Analyzer. This tool automates the process of generating configuration variants, deploying each on Triton, benchmarking with Perf Analyzer, and collecting performance metrics into a structured checkpoint for later analysis.

Description

Model Analyzer's profile subcommand orchestrates the full profiling workflow. It manages the Triton server lifecycle (launching, loading model configs, shutting down), invokes Perf Analyzer for each configuration variant, and stores results in a checkpoint directory.

The tool supports multiple Triton launch modes:

  • local -- Model Analyzer starts and manages a local Triton process directly
  • docker -- Model Analyzer launches Triton inside a Docker container
  • remote -- Model Analyzer connects to an already-running Triton instance
  • c_api -- Model Analyzer uses the Triton C API for in-process inference (lowest overhead)

During profiling, Model Analyzer generates configuration variants by sweeping across instance count, max batch size, and dynamic batching parameters within user-specified bounds. Each variant is tested at multiple concurrency levels, and the results (throughput, latency, GPU memory) are recorded.

Usage

CLI Signature

model-analyzer profile \
  --model-repository=<path> \
  --profile-models=<model_name> \
  --output-model-repository-path=<results_path> \
  [--triton-launch-mode=<local|docker|remote|c_api>] \
  [--run-config-search-max-concurrency=<int>] \
  [--run-config-search-max-instance-count=<int>] \
  [--run-config-search-max-model-batch-size=<int>] \
  [--checkpoint-directory=<path>] \
  [--override-output-model-repository]

Key Parameters

Parameter Description Default
--model-repository Path to the Triton model repository (required)
--profile-models Comma-separated list of model names to profile (required)
--output-model-repository-path Path to store generated configuration variants (required)
--triton-launch-mode How to launch Triton: local, docker, remote, or c_api local
--run-config-search-max-concurrency Maximum concurrency level to test 1024
--run-config-search-max-instance-count Maximum instance count to sweep 5
--run-config-search-max-model-batch-size Maximum batch size to sweep 128
--checkpoint-directory Directory to store/load profiling checkpoints ./checkpoints
--override-output-model-repository Overwrite existing output model repository false

Code Reference

Source Location

  • docs/user_guide/performance_tuning.md:L302-333 -- Model Analyzer profile command documentation and usage

Import / Installation

# Install Model Analyzer via pip
pip install triton-model-analyzer

# Install wkhtmltopdf for PDF report generation (optional)
apt-get install -y wkhtmltopdf

I/O Contract

Inputs

Input Type Required Description
Model repository path Directory path Yes Path to the Triton model repository containing the model(s) to profile
Model names String (comma-separated) Yes Names of models to profile (must exist in the model repository)
Output repository path Directory path Yes Path where generated configuration variants will be stored
Triton server Service or launchable Yes Either a running Triton server (remote mode) or ability to launch one (local/docker/c_api mode)
Search bounds Integers No Maximum values for concurrency, instance count, and batch size sweeps

Outputs

Output Type Description
Output model repository Directory Contains subdirectories for each configuration variant, each with its own config.pbtxt
Profiling checkpoint Directory Serialized profiling results for use by model-analyzer analyze
Console summary Text (stdout) Progress updates and summary of profiled configurations

Usage Examples

Example 1: Basic profiling with local Triton

Profile a model named densenet_onnx using a locally launched Triton server:

model-analyzer profile \
  --model-repository=/models \
  --profile-models=densenet_onnx \
  --output-model-repository-path=./results \
  --triton-launch-mode=local \
  --override-output-model-repository

Example 2: Profiling with bounded search space

Limit the configuration search to reduce profiling time:

model-analyzer profile \
  --model-repository=/models \
  --profile-models=densenet_onnx \
  --output-model-repository-path=./results \
  --triton-launch-mode=local \
  --run-config-search-max-concurrency=16 \
  --run-config-search-max-instance-count=4 \
  --run-config-search-max-model-batch-size=16 \
  --override-output-model-repository

Example 3: Profiling against a remote Triton server

Connect to an already-running Triton instance:

model-analyzer profile \
  --model-repository=/models \
  --profile-models=densenet_onnx \
  --output-model-repository-path=./results \
  --triton-launch-mode=remote \
  --triton-http-endpoint=localhost:8000 \
  --triton-grpc-endpoint=localhost:8001 \
  --triton-metrics-url=http://localhost:8002/metrics \
  --override-output-model-repository

Example 4: Profiling multiple models

Profile multiple models in a single run:

model-analyzer profile \
  --model-repository=/models \
  --profile-models=densenet_onnx,resnet50_onnx \
  --output-model-repository-path=./results \
  --triton-launch-mode=local \
  --override-output-model-repository

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment