Implementation:Triton inference server Server InferUtil

Knowledge Sources	Triton Inference Server
Domains	Testing, Inference
Last Updated	2026-02-13 17:00 GMT

Overview

Core inference utility library providing helper functions for sending inference requests and validating results across all QA tests.

Description

The `infer_util.py` module is the central testing utility for Triton QA, offering functions to construct inference requests, send them via both HTTP and gRPC protocols, and validate the returned results against expected outputs. It supports all data types, batched and non-batched requests, shared memory regions, binary tensor data, and various output validation strategies including exact match, approximate comparison, and shape-only checks. Nearly every Python-based QA test imports this module to avoid duplicating request construction and validation logic.

Usage

Import this module in any QA test script that needs to send inference requests to a running Triton server and validate the results. It abstracts away protocol differences between HTTP and gRPC clients.

Code Reference

Source Location

Repository: Triton Inference Server
File: qa/common/infer_util.py
Lines: 1-1463

Signature

def infer_exact(tester, pf, tensor_shape, batch_size, input_dtype, output_dtype,
                model_name, protocol="http", swap=False, timeout_us=0): ...

def infer_zero(tester, pf, batch_size, tensor_dtype, input_shapes, output_shapes,
               model_name, protocol="http"): ...

def infer_shape_tensor(tester, pf, tensor_shape, batch_size, dtype,
                       input_shapes, dummy_input_shapes, model_name, protocol="http"): ...

def validate_for_tf_model(expected_dtype, output_dtype, proto_dtype, output_data, expected_data): ...
def validate_for_onnx_model(expected_dtype, output_dtype, output_data, expected_data): ...

Import

import sys
sys.path.insert(0, "/path/to/qa/common")
import infer_util as iu

I/O Contract

Inputs

Name	Type	Required	Description
tester	unittest.TestCase	Yes	Test case instance for assertions
model_name	string	Yes	Name of the model to send requests to
tensor_shape	list[int]	Yes	Shape of the input tensors
batch_size	int	Yes	Number of requests in the batch
input_dtype	numpy.dtype	Yes	Data type of the input tensors
protocol	string	No	Protocol to use: "http" or "grpc" (default: "http")
timeout_us	int	No	Request timeout in microseconds (0 for no timeout)

Outputs

Name	Type	Description
assertion_results	None	Raises AssertionError via unittest if validation fails
infer_result	InferResult	Raw inference result object when using lower-level functions

Usage Examples

Basic Exact Inference Validation

import infer_util as iu
iu.infer_exact(self, "graphdef", (16,), 8,
               np.float32, np.float32,
               "simple", protocol="grpc")

Validate Over Both Protocols

for protocol in ["http", "grpc"]:
    iu.infer_exact(self, "onnx", (4, 4), 1,
                   np.int32, np.int32,
                   "simple_onnx", protocol=protocol)

Related Pages

Environment:Triton_inference_server_Server_GPU_CUDA_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment