Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server Ensemble Scheduling Schema

From Leeroopedia
Revision as of 13:57, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Triton_inference_server_Server_Ensemble_Scheduling_Schema.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Field Value
Implementation Name Ensemble_Scheduling_Schema
Implements Principle:Triton_inference_server_Server_Ensemble_Pipeline_Design
Domains Model_Serving, Pipeline_Architecture, MLOps
Status Active
Last Updated 2026-02-13 17:00 GMT

Overview

Concrete protobuf configuration schema for defining multi-model ensemble DAGs in Triton. The ensemble_scheduling block within a model's config.pbtxt specifies the composing models, their execution order, and the tensor routing between them.

Description

The ensemble_scheduling schema is the core mechanism for declaring ensemble pipelines in Triton Inference Server. It defines an ordered list of steps, where each step identifies a composing model and maps its inputs and outputs to named ensemble tensors. Triton uses these mappings to construct a DAG, resolve dependencies via topological sort, and execute composing models in the correct order with optional parallelism.

Key behaviors:

  • Tensor routing — The input_map connects ensemble tensors to composing model inputs; the output_map connects composing model outputs to ensemble tensors
  • Version selectionmodel_version: -1 selects the latest available version of the composing model
  • Backpressure controlmax_inflight_requests (uint32) limits concurrent ensemble requests to prevent resource exhaustion
  • Automatic dependency resolution — Triton infers execution order from the tensor connectivity

Usage

This schema is used whenever creating an ensemble model configuration. It is placed inside the config.pbtxt file for a model with platform: "ensemble".

Code Reference

Source Location

  • docs/user_guide/ensemble_models.md:L60-123 — Full ensemble configuration documentation
  • docs/user_guide/ensemble_models.md:L83-122 — ensemble_scheduling block specifics
  • docs/user_guide/ensemble_models.md:L186-191 — Mapping semantics
  • docs/user_guide/ensemble_models.md:L206-225 — max_inflight_requests

Signature

ensemble_scheduling {
  step [
    {
      model_name: "<composing_model>"
      model_version: -1
      input_map {
        key: "<model_input_name>"
        value: "<ensemble_tensor_name>"
      }
      output_map {
        key: "<model_output_name>"
        value: "<ensemble_tensor_name>"
      }
    }
  ]
}

Import

No import required. This is a protobuf text format configuration block placed directly inside a config.pbtxt file.

Key Parameters

Parameter Type Description
model_name string Name of the composing model (must exist in model repository)
model_version int Version of the composing model to use; -1 selects the latest version
input_map map<string, string> Maps composing model input name (key) to ensemble tensor name (value)
output_map map<string, string> Maps composing model output name (key) to ensemble tensor name (value)
max_inflight_requests uint32 Maximum number of concurrent ensemble inference requests (backpressure control)

I/O Contract

Inputs

Input Type Description
Composing model names string Names of models that exist in the model repository
Tensor names and shapes string, int[] Input/output tensor names and their dimensions for each composing model
Topology design DAG specification Desired routing pattern (simple, sequence, or fan)

Outputs

Output Type Description
ensemble_scheduling block protobuf text Complete ensemble_scheduling configuration block for inclusion in config.pbtxt

Usage Examples

Simple two-step ensemble (preprocessing → inference):

ensemble_scheduling {
  step [
    {
      model_name: "preprocess"
      model_version: -1
      input_map {
        key: "RAW_INPUT"
        value: "RAW_INPUT"
      }
      output_map {
        key: "PROCESSED"
        value: "intermediate_tensor"
      }
    },
    {
      model_name: "classifier"
      model_version: -1
      input_map {
        key: "INPUT"
        value: "intermediate_tensor"
      }
      output_map {
        key: "OUTPUT"
        value: "CLASSIFICATION"
      }
    }
  ]
}

Fan-out pattern (one input to two parallel models):

ensemble_scheduling {
  step [
    {
      model_name: "feature_extractor"
      model_version: -1
      input_map { key: "INPUT"  value: "RAW_INPUT" }
      output_map { key: "FEATURES"  value: "shared_features" }
    },
    {
      model_name: "classifier_a"
      model_version: -1
      input_map { key: "INPUT"  value: "shared_features" }
      output_map { key: "OUTPUT"  value: "CLASS_A" }
    },
    {
      model_name: "classifier_b"
      model_version: -1
      input_map { key: "INPUT"  value: "shared_features" }
      output_map { key: "OUTPUT"  value: "CLASS_B" }
    }
  ]
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment