Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server Optimal Config Application

From Leeroopedia
Field Value
Page Type Implementation
Title Optimal_Config_Application
Namespace Triton_inference_server_Server
Domains Performance, Model_Serving, Configuration
External Dependencies None (uses standard filesystem operations and Triton model repository conventions)
Last Updated 2026-02-13 17:00 GMT

Overview

Concrete config.pbtxt update procedure for applying optimal serving parameters from Model Analyzer results. This implementation covers both the automated approach (copying the best configuration from Model Analyzer output) and the manual approach (editing config.pbtxt directly with tuned parameters).

Description

After Model Analyzer's analyze step identifies the top-ranked configuration, the optimal config.pbtxt must be applied to the production model repository. The Model Analyzer stores each profiled configuration variant in the output model repository, making deployment a simple file copy operation.

For manual optimization, the relevant configuration blocks (instance_group, dynamic_batching, max_batch_size, optimization) are edited directly in the model's config.pbtxt.

Key parameters to tune:

  • max_batch_size (int) -- Maximum batch size the server will form for this model
  • dynamic_batching.preferred_batch_size (list[int]) -- Preferred batch sizes the dynamic batcher will try to form
  • dynamic_batching.max_queue_delay_microseconds (int) -- Maximum time in microseconds to delay a request while waiting for a preferred batch size
  • instance_group[].count (int) -- Number of model instances to create
  • instance_group[].kind (KIND_GPU or KIND_CPU) -- Device type for model instances
  • optimization.execution_accelerators (tensorrt, openvino) -- Framework-specific inference acceleration

Usage

CLI Signature (Automated)

# Copy the optimal configuration from Model Analyzer results
cp ./results/<optimal_config>/config.pbtxt <model-repository>/<model-name>/config.pbtxt

# Reload the model on a running Triton server (if using explicit model control)
curl -X POST "http://localhost:8000/v2/repository/models/<model-name>/load"

Key Parameters

Parameter Location in config.pbtxt Type Description
max_batch_size Top-level int Maximum batch size for the model (0 disables batching)
preferred_batch_size dynamic_batching list[int] Preferred batch sizes for dynamic batcher to form
max_queue_delay_microseconds dynamic_batching int Maximum queue delay in microseconds
count instance_group[] int Number of model instances per device
kind instance_group[] enum Device type: KIND_GPU or KIND_CPU
execution_accelerators optimization block TensorRT, OpenVINO, or other accelerator config

Code Reference

Source Location

  • docs/user_guide/performance_tuning.md:L370-386 -- Applying optimal configuration from Model Analyzer
  • docs/user_guide/model_configuration.md:L545-681 -- instance_group configuration reference
  • docs/user_guide/batcher.md:L32-151 -- dynamic_batching configuration reference
  • docs/user_guide/optimization.md:L91-295 -- Optimization and execution accelerators reference

Configuration Template

name: "model_name"
platform: "onnxruntime_onnx"
max_batch_size: 8

input [
  {
    name: "input"
    data_type: TYPE_FP32
    dims: [ 3, 224, 224 ]
  }
]

output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [ 1000 ]
  }
]

instance_group [
  {
    count: 2
    kind: KIND_GPU
    gpus: [ 0 ]
  }
]

dynamic_batching {
  preferred_batch_size: [ 4, 8 ]
  max_queue_delay_microseconds: 100
}

optimization {
  execution_accelerators {
    gpu_execution_accelerator : [ {
      name : "tensorrt"
      parameters { key: "precision_mode" value: "FP16" }
      parameters { key: "max_workspace_size_bytes" value: "1073741824" }
    }]
  }
}

I/O Contract

Inputs

Input Type Required Description
Optimal config from Model Analyzer File (config.pbtxt) Yes (automated) The top-ranked configuration file from the Model Analyzer output repository
Model repository path Directory path Yes Path to the production Triton model repository
Model name String Yes Name of the model whose configuration is being updated
Tuning parameters Various Yes (manual) Specific values for instance_group, dynamic_batching, max_batch_size, optimization blocks

Outputs

Output Type Description
Updated config.pbtxt File The model's configuration file with optimized serving parameters applied
Model reload confirmation HTTP response Confirmation that the model was successfully reloaded with the new configuration (when using explicit model control)

Usage Examples

Example 1: Apply optimal config from Model Analyzer

Copy the top-ranked configuration from Model Analyzer results:

# List available configurations in the output repository
ls ./results/

# Copy the best configuration (identified from analyze output)
cp ./results/densenet_onnx_config_3/config.pbtxt \
   /models/densenet_onnx/config.pbtxt

# Reload the model on a running server
curl -X POST "http://localhost:8000/v2/repository/models/densenet_onnx/load"

Example 2: Manual config optimization -- enable dynamic batching

Edit config.pbtxt to add dynamic batching:

# Add to config.pbtxt
max_batch_size: 8

dynamic_batching {
  preferred_batch_size: [ 4, 8 ]
  max_queue_delay_microseconds: 100
}

Example 3: Manual config optimization -- increase instance count

Edit config.pbtxt to increase GPU instances:

# Update instance_group in config.pbtxt
instance_group [
  {
    count: 3
    kind: KIND_GPU
    gpus: [ 0 ]
  }
]

Example 4: Manual config optimization -- enable TensorRT acceleration

Add TensorRT execution accelerator for an ONNX model:

# Add optimization block to config.pbtxt
optimization {
  execution_accelerators {
    gpu_execution_accelerator : [ {
      name : "tensorrt"
      parameters { key: "precision_mode" value: "FP16" }
      parameters { key: "max_workspace_size_bytes" value: "1073741824" }
    }]
  }
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment