Implementation:Triton inference server Server Ensemble Scheduling Schema
| Field | Value |
|---|---|
| Implementation Name | Ensemble_Scheduling_Schema |
| Implements | Principle:Triton_inference_server_Server_Ensemble_Pipeline_Design |
| Domains | Model_Serving, Pipeline_Architecture, MLOps |
| Status | Active |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Concrete protobuf configuration schema for defining multi-model ensemble DAGs in Triton. The ensemble_scheduling block within a model's config.pbtxt specifies the composing models, their execution order, and the tensor routing between them.
Description
The ensemble_scheduling schema is the core mechanism for declaring ensemble pipelines in Triton Inference Server. It defines an ordered list of steps, where each step identifies a composing model and maps its inputs and outputs to named ensemble tensors. Triton uses these mappings to construct a DAG, resolve dependencies via topological sort, and execute composing models in the correct order with optional parallelism.
Key behaviors:
- Tensor routing — The
input_mapconnects ensemble tensors to composing model inputs; theoutput_mapconnects composing model outputs to ensemble tensors - Version selection —
model_version: -1selects the latest available version of the composing model - Backpressure control —
max_inflight_requests(uint32) limits concurrent ensemble requests to prevent resource exhaustion - Automatic dependency resolution — Triton infers execution order from the tensor connectivity
Usage
This schema is used whenever creating an ensemble model configuration. It is placed inside the config.pbtxt file for a model with platform: "ensemble".
Code Reference
Source Location
docs/user_guide/ensemble_models.md:L60-123— Full ensemble configuration documentationdocs/user_guide/ensemble_models.md:L83-122— ensemble_scheduling block specificsdocs/user_guide/ensemble_models.md:L186-191— Mapping semanticsdocs/user_guide/ensemble_models.md:L206-225— max_inflight_requests
Signature
ensemble_scheduling {
step [
{
model_name: "<composing_model>"
model_version: -1
input_map {
key: "<model_input_name>"
value: "<ensemble_tensor_name>"
}
output_map {
key: "<model_output_name>"
value: "<ensemble_tensor_name>"
}
}
]
}
Import
No import required. This is a protobuf text format configuration block placed directly inside a config.pbtxt file.
Key Parameters
| Parameter | Type | Description |
|---|---|---|
| model_name | string | Name of the composing model (must exist in model repository) |
| model_version | int | Version of the composing model to use; -1 selects the latest version
|
| input_map | map<string, string> | Maps composing model input name (key) to ensemble tensor name (value) |
| output_map | map<string, string> | Maps composing model output name (key) to ensemble tensor name (value) |
| max_inflight_requests | uint32 | Maximum number of concurrent ensemble inference requests (backpressure control) |
I/O Contract
Inputs
| Input | Type | Description |
|---|---|---|
| Composing model names | string | Names of models that exist in the model repository |
| Tensor names and shapes | string, int[] | Input/output tensor names and their dimensions for each composing model |
| Topology design | DAG specification | Desired routing pattern (simple, sequence, or fan) |
Outputs
| Output | Type | Description |
|---|---|---|
| ensemble_scheduling block | protobuf text | Complete ensemble_scheduling configuration block for inclusion in config.pbtxt
|
Usage Examples
Simple two-step ensemble (preprocessing → inference):
ensemble_scheduling {
step [
{
model_name: "preprocess"
model_version: -1
input_map {
key: "RAW_INPUT"
value: "RAW_INPUT"
}
output_map {
key: "PROCESSED"
value: "intermediate_tensor"
}
},
{
model_name: "classifier"
model_version: -1
input_map {
key: "INPUT"
value: "intermediate_tensor"
}
output_map {
key: "OUTPUT"
value: "CLASSIFICATION"
}
}
]
}
Fan-out pattern (one input to two parallel models):
ensemble_scheduling {
step [
{
model_name: "feature_extractor"
model_version: -1
input_map { key: "INPUT" value: "RAW_INPUT" }
output_map { key: "FEATURES" value: "shared_features" }
},
{
model_name: "classifier_a"
model_version: -1
input_map { key: "INPUT" value: "shared_features" }
output_map { key: "OUTPUT" value: "CLASS_A" }
},
{
model_name: "classifier_b"
model_version: -1
input_map { key: "INPUT" value: "shared_features" }
output_map { key: "OUTPUT" value: "CLASS_B" }
}
]
}