Principle:Triton inference server Server Model Configuration
| Knowledge Sources | |
|---|---|
| Domains | MLOps, Model_Serving, Configuration |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
A declarative schema for specifying model serving properties including input/output tensors, batching behavior, instance groups, and optimization policies.
Description
Model Configuration defines the contract between a model and the inference server through a protobuf text format (config.pbtxt). It specifies the model's name, backend/platform, maximum batch size, input tensor specifications (names, data types, dimensions), and output tensor specifications. This configuration enables the server to correctly route inference requests, allocate memory, and apply optimizations like dynamic batching or TensorRT acceleration.
For some backends (ONNX, TensorRT, TensorFlow), Triton can auto-complete the configuration by inspecting the model file, making config.pbtxt optional. For others (Python backend, ensemble models), explicit configuration is required.
Usage
Use this principle whenever deploying a model on Triton Inference Server. Configuration is required for Python backend models, ensemble models, and any model where auto-completion is insufficient (e.g., custom batching, instance groups, or optimization policies). Even when auto-completion is available, explicit configuration is recommended for production deployments to ensure deterministic behavior.
Theoretical Basis
The configuration follows the ModelConfig protobuf schema:
# Minimal required fields
name: "<model-name>"
platform: "<platform>" | backend: "<backend>"
max_batch_size: <int>
input [
{
name: "<tensor-name>"
data_type: <TYPE_ENUM>
dims: [ <d1>, <d2>, ... ]
}
]
output [
{
name: "<tensor-name>"
data_type: <TYPE_ENUM>
dims: [ <d1>, <d2>, ... ]
}
]
Key concepts:
- platform vs backend: platform is the legacy field (e.g., "onnxruntime_onnx"), backend is the modern field (e.g., "onnxruntime")
- max_batch_size: When > 0, enables batching; the input/output dims exclude the batch dimension. When 0, batching is disabled
- dims: Shape of a single input/output tensor (excluding batch dimension if max_batch_size > 0)
- Variable-length dimensions use -1 as a wildcard