Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server TritonPythonModel BLS

From Leeroopedia
Field Value
Implementation Name TritonPythonModel_BLS
Implements Principle:Triton_inference_server_Server_Component_Model_Preparation
Domains Model_Serving, Python_Backend, Pipeline_Architecture
Status Active
Last Updated 2026-02-13 17:00 GMT

Overview

Concrete Python backend interface for implementing custom model logic with Business Logic Scripting (BLS) in Triton. BLS models use the TritonPythonModel class and the triton_python_backend_utils module to process requests and optionally invoke other models in-process.

Description

The TritonPythonModel class is the required interface for all Python backend models in Triton. When used with BLS, the execute() method can create pb_utils.InferenceRequest objects to invoke other models deployed on the same Triton instance, enabling custom orchestration logic without network overhead.

Key capabilities:

  • Synchronous BLS — Call inference_request.exec() to invoke another model and block until the result is available
  • Asynchronous BLS — Define async def execute() and call await inference_request.async_exec() for non-blocking invocation
  • Tensor manipulation — Use pb_utils.Tensor to create tensors and pb_utils.get_input_tensor_by_name() / pb_utils.get_output_tensor_by_name() to extract tensors from requests/responses
  • Error handling — Use pb_utils.TritonError to propagate errors back to the client

Usage

This implementation is used when:

  • Creating preprocessing or postprocessing models in Python for an ensemble
  • Implementing custom business logic that orchestrates multiple model calls
  • Building models that require data transformation between inference steps
  • Wrapping external service calls or database lookups within an inference pipeline

Code Reference

Source Location

  • docs/user_guide/bls.md:L48-97 — Synchronous BLS interface
  • docs/user_guide/bls.md:L107-153 — Asynchronous BLS interface

Signature

import triton_python_backend_utils as pb_utils
import numpy as np

class TritonPythonModel:
    def initialize(self, args):
        """Called once at model load. args contains model config."""
        pass

    def execute(self, requests):
        """Process each inference request. Can invoke other models via BLS."""
        responses = []
        for request in requests:
            input_tensor = pb_utils.get_input_tensor_by_name(request, "INPUT")

            # BLS: invoke another model
            inference_request = pb_utils.InferenceRequest(
                model_name="other_model",
                requested_output_names=["OUTPUT"],
                inputs=[pb_utils.Tensor("INPUT", input_tensor.as_numpy())]
            )
            inference_response = inference_request.exec()

            output = pb_utils.get_output_tensor_by_name(inference_response, "OUTPUT")
            responses.append(pb_utils.InferenceResponse(output_tensors=[output]))
        return responses

    def finalize(self):
        """Called once at model unload."""
        pass

Import

import triton_python_backend_utils as pb_utils
import numpy as np

Key Parameters

BLS InferenceRequest parameters:

Parameter Type Description
model_name string Name of the target model to invoke
requested_output_names list[str] List of output tensor names to request from the target model
inputs list[pb_utils.Tensor] List of input tensors to send to the target model
timeout int (optional) Timeout in microseconds for the BLS request
model_version int (optional) Specific version of the target model to invoke

I/O Contract

Inputs

Input Type Description
Model file model.py Python file implementing the TritonPythonModel class, placed at model_name/1/model.py
Model config config.pbtxt Configuration file specifying backend, inputs, outputs, and instance settings

Outputs

Output Type Description
Deployed component model Triton model A loaded model ready to serve inference requests or participate in an ensemble

Usage Examples

Synchronous BLS preprocessing model:

import triton_python_backend_utils as pb_utils
import numpy as np

class TritonPythonModel:
    def initialize(self, args):
        self.mean = np.array([0.485, 0.456, 0.406], dtype=np.float32)
        self.std = np.array([0.229, 0.224, 0.225], dtype=np.float32)

    def execute(self, requests):
        responses = []
        for request in requests:
            raw_input = pb_utils.get_input_tensor_by_name(request, "RAW_IMAGE")
            image = raw_input.as_numpy().astype(np.float32) / 255.0
            normalized = (image - self.mean) / self.std

            output_tensor = pb_utils.Tensor("PROCESSED_IMAGE", normalized)
            responses.append(pb_utils.InferenceResponse(output_tensors=[output_tensor]))
        return responses

    def finalize(self):
        pass

Asynchronous BLS model invoking another model:

import triton_python_backend_utils as pb_utils
import numpy as np

class TritonPythonModel:
    def initialize(self, args):
        pass

    async def execute(self, requests):
        responses = []
        for request in requests:
            input_tensor = pb_utils.get_input_tensor_by_name(request, "INPUT")

            inference_request = pb_utils.InferenceRequest(
                model_name="downstream_model",
                requested_output_names=["OUTPUT"],
                inputs=[pb_utils.Tensor("INPUT", input_tensor.as_numpy())]
            )
            inference_response = await inference_request.async_exec()

            output = pb_utils.get_output_tensor_by_name(inference_response, "OUTPUT")
            responses.append(pb_utils.InferenceResponse(output_tensors=[output]))
        return responses

    def finalize(self):
        pass

Component model config.pbtxt:

name: "preprocess"
backend: "python"
max_batch_size: 8

input [
  { name: "RAW_IMAGE", data_type: TYPE_UINT8, dims: [ 224, 224, 3 ] }
]
output [
  { name: "PROCESSED_IMAGE", data_type: TYPE_FP32, dims: [ 224, 224, 3 ] }
]

instance_group [
  { count: 1, kind: KIND_CPU }
]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment