Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Triton inference server Server Component Model Preparation

From Leeroopedia
Revision as of 18:12, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Triton_inference_server_Server_Component_Model_Preparation.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Field Value
Principle Name Component_Model_Preparation
Knowledge Sources Triton Server|https://github.com/triton-inference-server/server, source::Doc|BLS|https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/user_guide/bls.html
Domains Model_Serving, Python_Backend, Pipeline_Architecture
Status Active
Last Updated 2026-02-13 17:00 GMT

Overview

Process of preparing individual models for use within an ensemble pipeline, including Python backend models with Business Logic Scripting (BLS). Each component model must expose correctly named input/output tensors and be independently deployable before inclusion in an ensemble.

Description

Component models in an ensemble can be standard framework models (ONNX, TensorRT, PyTorch) or Python backend models using Business Logic Scripting (BLS). BLS allows custom pre/post-processing logic through the TritonPythonModel interface, where the execute() method processes inference requests and can invoke other models via pb_utils.InferenceRequest.

Each component model requires:

  • Its own config.pbtxt with named input/output tensors
  • Versioned model files in the model repository (e.g., model_name/1/model.onnx or model_name/1/model.py)
  • Tensor names, data types, and shapes that match what the ensemble configuration expects

For Python backend models using BLS:

  • The model file is model.py implementing the TritonPythonModel class
  • The execute() method receives a list of InferenceRequest objects and returns a list of InferenceResponse objects
  • BLS enables in-process model-to-model inference via pb_utils.InferenceRequest without network overhead
  • Both synchronous (exec()) and asynchronous (async_exec()) BLS invocation are supported

Usage

Component model preparation is required before creating any ensemble configuration. It applies when:

  • Building preprocessing or postprocessing stages as Python backend models
  • Wrapping custom business logic around standard inference models
  • Preparing framework models (ONNX, TensorRT, TensorFlow, PyTorch) with correct tensor specifications
  • Implementing BLS models that orchestrate calls to other models within the same Triton instance

Theoretical Basis

The component model preparation principle is based on interface contract design:

  • Each component exposes named input/output tensors that serve as the contract between the model and the ensemble router
  • The ensemble configuration connects these contracts through tensor mappings
  • BLS extends this contract by providing in-process model-to-model inference via pb_utils, enabling custom logic without leaving the server process

The TritonPythonModel interface defines three lifecycle methods:

  1. initialize(self, args) — Called once at model load; receives model configuration
  2. execute(self, requests) — Called for each batch of inference requests; returns responses
  3. finalize(self) — Called once at model unload; performs cleanup

Source: docs/user_guide/bls.md:L48-153, docs/user_guide/model_configuration.md:L39-75

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment