Principle:Triton inference server Server Component Model Preparation
| Field | Value |
|---|---|
| Principle Name | Component_Model_Preparation |
| Knowledge Sources | Triton Server|https://github.com/triton-inference-server/server, source::Doc|BLS|https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/bls.html |
| Domains | Model_Serving, Python_Backend, Pipeline_Architecture |
| Status | Active |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Process of preparing individual models for use within an ensemble pipeline, including Python backend models with Business Logic Scripting (BLS). Each component model must expose correctly named input/output tensors and be independently deployable before inclusion in an ensemble.
Description
Component models in an ensemble can be standard framework models (ONNX, TensorRT, PyTorch) or Python backend models using Business Logic Scripting (BLS). BLS allows custom pre/post-processing logic through the TritonPythonModel interface, where the execute() method processes inference requests and can invoke other models via pb_utils.InferenceRequest.
Each component model requires:
- Its own
config.pbtxtwith named input/output tensors - Versioned model files in the model repository (e.g.,
model_name/1/model.onnxormodel_name/1/model.py) - Tensor names, data types, and shapes that match what the ensemble configuration expects
For Python backend models using BLS:
- The model file is
model.pyimplementing theTritonPythonModelclass - The
execute()method receives a list ofInferenceRequestobjects and returns a list ofInferenceResponseobjects - BLS enables in-process model-to-model inference via
pb_utils.InferenceRequestwithout network overhead - Both synchronous (
exec()) and asynchronous (async_exec()) BLS invocation are supported
Usage
Component model preparation is required before creating any ensemble configuration. It applies when:
- Building preprocessing or postprocessing stages as Python backend models
- Wrapping custom business logic around standard inference models
- Preparing framework models (ONNX, TensorRT, TensorFlow, PyTorch) with correct tensor specifications
- Implementing BLS models that orchestrate calls to other models within the same Triton instance
Theoretical Basis
The component model preparation principle is based on interface contract design:
- Each component exposes named input/output tensors that serve as the contract between the model and the ensemble router
- The ensemble configuration connects these contracts through tensor mappings
- BLS extends this contract by providing in-process model-to-model inference via
pb_utils, enabling custom logic without leaving the server process
The TritonPythonModel interface defines three lifecycle methods:
initialize(self, args)— Called once at model load; receives model configurationexecute(self, requests)— Called for each batch of inference requests; returns responsesfinalize(self)— Called once at model unload; performs cleanup
Source: docs/user_guide/bls.md:L48-153, docs/user_guide/model_configuration.md:L39-75