Implementation:Triton inference server Server Config Pbtxt Schema
| Knowledge Sources | |
|---|---|
| Domains | MLOps, Configuration, Model_Serving |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Concrete protobuf text schema for declaring model serving properties in Triton Inference Server's config.pbtxt format.
Description
The config.pbtxt file is a protobuf text format file based on the ModelConfig proto definition from the Triton common repository. It is placed in a model's directory within the model repository and specifies all properties the server needs to correctly serve the model: tensor names, shapes, data types, batching behavior, instance groups, and optimization policies.
Usage
Create this file in <model-repository>/<model-name>/config.pbtxt whenever deploying a model that requires explicit configuration, especially Python backend models, ensemble models, or models needing custom batching or optimization settings.
Code Reference
Source Location
- Repository: triton-inference-server/server
- File: docs/user_guide/model_configuration.md
- Lines: L39-68 (Minimal Config), L70-75 (platform/backend), L87-93 (max_batch_size), L199-228 (Auto-Generated Config), L545-681 (Instance Groups)
Signature
# config.pbtxt schema (ModelConfig protobuf text format)
name: "<string>" # Model name (must match directory name)
platform: "<string>" # e.g., "onnxruntime_onnx", "tensorrt_plan"
# OR
backend: "<string>" # e.g., "onnxruntime", "python", "tensorrt"
max_batch_size: <int> # 0 = no batching, >0 = max batch dimension
input [
{
name: "<string>" # Tensor name
data_type: TYPE_FP32 # Enum: TYPE_BOOL, TYPE_INT8/16/32/64, TYPE_FP16/32/64, TYPE_STRING, TYPE_BYTES
dims: [ <int>, <int>, ... ] # Tensor shape (-1 for variable)
reshape { shape: [ <int>, ... ] } # Optional reshape
}
]
output [
{
name: "<string>"
data_type: TYPE_FP32
dims: [ <int>, <int>, ... ]
label_filename: "<string>" # Optional classification labels
}
]
instance_group [ # Optional: control model instances
{
count: <int> # Number of instances
kind: KIND_GPU # KIND_GPU, KIND_CPU, KIND_MODEL
gpus: [ <int>, ... ] # GPU device IDs
}
]
dynamic_batching { # Optional: enable dynamic batching
preferred_batch_size: [ <int>, ... ]
max_queue_delay_microseconds: <int>
}
Import
# No import — this is a configuration file format
# Place at: <model-repository>/<model-name>/config.pbtxt
# Protobuf definition: triton-inference-server/common/protobuf/model_config.proto
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| name | string | Yes | Model name matching the directory name |
| platform/backend | string | Yes | Inference backend identifier |
| max_batch_size | int | Yes | Maximum batch size (0 to disable batching) |
| input | ModelInput[] | Yes | Input tensor specifications (name, type, shape) |
| output | ModelOutput[] | Yes | Output tensor specifications (name, type, shape) |
| instance_group | InstanceGroup[] | No | GPU/CPU instance allocation |
| dynamic_batching | DynamicBatcher | No | Dynamic batching configuration |
Outputs
| Name | Type | Description |
|---|---|---|
| config.pbtxt | protobuf text file | Valid configuration file in model repository |
| model metadata | server-internal | Parsed model configuration used by Triton core |
Usage Examples
Minimal ONNX Configuration
name: "densenet_onnx"
platform: "onnxruntime_onnx"
max_batch_size: 0
input [
{
name: "data_0"
data_type: TYPE_FP32
dims: [ 3, 224, 224 ]
}
]
output [
{
name: "fc6_1"
data_type: TYPE_FP32
dims: [ 1000 ]
label_filename: "densenet_labels.txt"
}
]
Configuration with Dynamic Batching and Instance Groups
name: "text_classifier"
backend: "onnxruntime"
max_batch_size: 32
input [
{
name: "input_ids"
data_type: TYPE_INT64
dims: [ -1 ]
},
{
name: "attention_mask"
data_type: TYPE_INT64
dims: [ -1 ]
}
]
output [
{
name: "logits"
data_type: TYPE_FP32
dims: [ 2 ]
}
]
instance_group [
{
count: 2
kind: KIND_GPU
gpus: [ 0 ]
}
]
dynamic_batching {
preferred_batch_size: [ 4, 8, 16 ]
max_queue_delay_microseconds: 100
}