Implementation:Triton inference server Server Ensemble Scheduling Schema

Field	Value
Implementation Name	Ensemble_Scheduling_Schema
Implements	Principle:Triton_inference_server_Server_Ensemble_Pipeline_Design
Domains	Model_Serving, Pipeline_Architecture, MLOps
Status	Active
Last Updated	2026-02-13 17:00 GMT

Overview

Concrete protobuf configuration schema for defining multi-model ensemble DAGs in Triton. The ensemble_scheduling block within a model's config.pbtxt specifies the composing models, their execution order, and the tensor routing between them.

Description

The ensemble_scheduling schema is the core mechanism for declaring ensemble pipelines in Triton Inference Server. It defines an ordered list of steps, where each step identifies a composing model and maps its inputs and outputs to named ensemble tensors. Triton uses these mappings to construct a DAG, resolve dependencies via topological sort, and execute composing models in the correct order with optional parallelism.

Key behaviors:

Tensor routing — The input_map connects ensemble tensors to composing model inputs; the output_map connects composing model outputs to ensemble tensors
Version selection — model_version: -1 selects the latest available version of the composing model
Backpressure control — max_inflight_requests (uint32) limits concurrent ensemble requests to prevent resource exhaustion
Automatic dependency resolution — Triton infers execution order from the tensor connectivity

Usage

This schema is used whenever creating an ensemble model configuration. It is placed inside the config.pbtxt file for a model with platform: "ensemble".

Code Reference

Source Location

docs/user_guide/ensemble_models.md:L60-123 — Full ensemble configuration documentation
docs/user_guide/ensemble_models.md:L83-122 — ensemble_scheduling block specifics
docs/user_guide/ensemble_models.md:L186-191 — Mapping semantics
docs/user_guide/ensemble_models.md:L206-225 — max_inflight_requests

Signature

ensemble_scheduling {
  step [
    {
      model_name: "<composing_model>"
      model_version: -1
      input_map {
        key: "<model_input_name>"
        value: "<ensemble_tensor_name>"
      }
      output_map {
        key: "<model_output_name>"
        value: "<ensemble_tensor_name>"
      }
    }
  ]
}

Import

No import required. This is a protobuf text format configuration block placed directly inside a config.pbtxt file.

Key Parameters

Parameter	Type	Description
model_name	string	Name of the composing model (must exist in model repository)
model_version	int	Version of the composing model to use; `-1` selects the latest version
input_map	map<string, string>	Maps composing model input name (key) to ensemble tensor name (value)
output_map	map<string, string>	Maps composing model output name (key) to ensemble tensor name (value)
max_inflight_requests	uint32	Maximum number of concurrent ensemble inference requests (backpressure control)

I/O Contract

Inputs

Input	Type	Description
Composing model names	string	Names of models that exist in the model repository
Tensor names and shapes	string, int[]	Input/output tensor names and their dimensions for each composing model
Topology design	DAG specification	Desired routing pattern (simple, sequence, or fan)

Outputs

Output	Type	Description
ensemble_scheduling block	protobuf text	Complete `ensemble_scheduling` configuration block for inclusion in `config.pbtxt`

Usage Examples

Simple two-step ensemble (preprocessing → inference):

ensemble_scheduling {
  step [
    {
      model_name: "preprocess"
      model_version: -1
      input_map {
        key: "RAW_INPUT"
        value: "RAW_INPUT"
      }
      output_map {
        key: "PROCESSED"
        value: "intermediate_tensor"
      }
    },
    {
      model_name: "classifier"
      model_version: -1
      input_map {
        key: "INPUT"
        value: "intermediate_tensor"
      }
      output_map {
        key: "OUTPUT"
        value: "CLASSIFICATION"
      }
    }
  ]
}

Fan-out pattern (one input to two parallel models):

ensemble_scheduling {
  step [
    {
      model_name: "feature_extractor"
      model_version: -1
      input_map { key: "INPUT"  value: "RAW_INPUT" }
      output_map { key: "FEATURES"  value: "shared_features" }
    },
    {
      model_name: "classifier_a"
      model_version: -1
      input_map { key: "INPUT"  value: "shared_features" }
      output_map { key: "OUTPUT"  value: "CLASS_A" }
    },
    {
      model_name: "classifier_b"
      model_version: -1
      input_map { key: "INPUT"  value: "shared_features" }
      output_map { key: "OUTPUT"  value: "CLASS_B" }
    }
  ]
}

Related Pages

implements::Principle:Triton_inference_server_Server_Ensemble_Pipeline_Design

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment