Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server Config Pbtxt Schema

From Leeroopedia
Revision as of 13:57, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Triton_inference_server_Server_Config_Pbtxt_Schema.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains MLOps, Configuration, Model_Serving
Last Updated 2026-02-13 17:00 GMT

Overview

Concrete protobuf text schema for declaring model serving properties in Triton Inference Server's config.pbtxt format.

Description

The config.pbtxt file is a protobuf text format file based on the ModelConfig proto definition from the Triton common repository. It is placed in a model's directory within the model repository and specifies all properties the server needs to correctly serve the model: tensor names, shapes, data types, batching behavior, instance groups, and optimization policies.

Usage

Create this file in <model-repository>/<model-name>/config.pbtxt whenever deploying a model that requires explicit configuration, especially Python backend models, ensemble models, or models needing custom batching or optimization settings.

Code Reference

Source Location

  • Repository: triton-inference-server/server
  • File: docs/user_guide/model_configuration.md
  • Lines: L39-68 (Minimal Config), L70-75 (platform/backend), L87-93 (max_batch_size), L199-228 (Auto-Generated Config), L545-681 (Instance Groups)

Signature

# config.pbtxt schema (ModelConfig protobuf text format)
name: "<string>"                              # Model name (must match directory name)
platform: "<string>"                          # e.g., "onnxruntime_onnx", "tensorrt_plan"
# OR
backend: "<string>"                           # e.g., "onnxruntime", "python", "tensorrt"

max_batch_size: <int>                         # 0 = no batching, >0 = max batch dimension

input [
  {
    name: "<string>"                          # Tensor name
    data_type: TYPE_FP32                      # Enum: TYPE_BOOL, TYPE_INT8/16/32/64, TYPE_FP16/32/64, TYPE_STRING, TYPE_BYTES
    dims: [ <int>, <int>, ... ]               # Tensor shape (-1 for variable)
    reshape { shape: [ <int>, ... ] }         # Optional reshape
  }
]

output [
  {
    name: "<string>"
    data_type: TYPE_FP32
    dims: [ <int>, <int>, ... ]
    label_filename: "<string>"                # Optional classification labels
  }
]

instance_group [                              # Optional: control model instances
  {
    count: <int>                              # Number of instances
    kind: KIND_GPU                            # KIND_GPU, KIND_CPU, KIND_MODEL
    gpus: [ <int>, ... ]                      # GPU device IDs
  }
]

dynamic_batching {                            # Optional: enable dynamic batching
  preferred_batch_size: [ <int>, ... ]
  max_queue_delay_microseconds: <int>
}

Import

# No import — this is a configuration file format
# Place at: <model-repository>/<model-name>/config.pbtxt
# Protobuf definition: triton-inference-server/common/protobuf/model_config.proto

I/O Contract

Inputs

Name Type Required Description
name string Yes Model name matching the directory name
platform/backend string Yes Inference backend identifier
max_batch_size int Yes Maximum batch size (0 to disable batching)
input ModelInput[] Yes Input tensor specifications (name, type, shape)
output ModelOutput[] Yes Output tensor specifications (name, type, shape)
instance_group InstanceGroup[] No GPU/CPU instance allocation
dynamic_batching DynamicBatcher No Dynamic batching configuration

Outputs

Name Type Description
config.pbtxt protobuf text file Valid configuration file in model repository
model metadata server-internal Parsed model configuration used by Triton core

Usage Examples

Minimal ONNX Configuration

name: "densenet_onnx"
platform: "onnxruntime_onnx"
max_batch_size: 0
input [
  {
    name: "data_0"
    data_type: TYPE_FP32
    dims: [ 3, 224, 224 ]
  }
]
output [
  {
    name: "fc6_1"
    data_type: TYPE_FP32
    dims: [ 1000 ]
    label_filename: "densenet_labels.txt"
  }
]

Configuration with Dynamic Batching and Instance Groups

name: "text_classifier"
backend: "onnxruntime"
max_batch_size: 32

input [
  {
    name: "input_ids"
    data_type: TYPE_INT64
    dims: [ -1 ]
  },
  {
    name: "attention_mask"
    data_type: TYPE_INT64
    dims: [ -1 ]
  }
]
output [
  {
    name: "logits"
    data_type: TYPE_FP32
    dims: [ 2 ]
  }
]

instance_group [
  {
    count: 2
    kind: KIND_GPU
    gpus: [ 0 ]
  }
]

dynamic_batching {
  preferred_batch_size: [ 4, 8, 16 ]
  max_queue_delay_microseconds: 100
}

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment