Implementation:Triton inference server Server InferUtil
| Knowledge Sources | |
|---|---|
| Domains | Testing, Inference |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Core inference utility library providing helper functions for sending inference requests and validating results across all QA tests.
Description
The `infer_util.py` module is the central testing utility for Triton QA, offering functions to construct inference requests, send them via both HTTP and gRPC protocols, and validate the returned results against expected outputs. It supports all data types, batched and non-batched requests, shared memory regions, binary tensor data, and various output validation strategies including exact match, approximate comparison, and shape-only checks. Nearly every Python-based QA test imports this module to avoid duplicating request construction and validation logic.
Usage
Import this module in any QA test script that needs to send inference requests to a running Triton server and validate the results. It abstracts away protocol differences between HTTP and gRPC clients.
Code Reference
Source Location
- Repository: Triton Inference Server
- File: qa/common/infer_util.py
- Lines: 1-1463
Signature
def infer_exact(tester, pf, tensor_shape, batch_size, input_dtype, output_dtype,
model_name, protocol="http", swap=False, timeout_us=0): ...
def infer_zero(tester, pf, batch_size, tensor_dtype, input_shapes, output_shapes,
model_name, protocol="http"): ...
def infer_shape_tensor(tester, pf, tensor_shape, batch_size, dtype,
input_shapes, dummy_input_shapes, model_name, protocol="http"): ...
def validate_for_tf_model(expected_dtype, output_dtype, proto_dtype, output_data, expected_data): ...
def validate_for_onnx_model(expected_dtype, output_dtype, output_data, expected_data): ...
Import
import sys
sys.path.insert(0, "/path/to/qa/common")
import infer_util as iu
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| tester | unittest.TestCase | Yes | Test case instance for assertions |
| model_name | string | Yes | Name of the model to send requests to |
| tensor_shape | list[int] | Yes | Shape of the input tensors |
| batch_size | int | Yes | Number of requests in the batch |
| input_dtype | numpy.dtype | Yes | Data type of the input tensors |
| protocol | string | No | Protocol to use: "http" or "grpc" (default: "http") |
| timeout_us | int | No | Request timeout in microseconds (0 for no timeout) |
Outputs
| Name | Type | Description |
|---|---|---|
| assertion_results | None | Raises AssertionError via unittest if validation fails |
| infer_result | InferResult | Raw inference result object when using lower-level functions |
Usage Examples
Basic Exact Inference Validation
import infer_util as iu
iu.infer_exact(self, "graphdef", (16,), 8,
np.float32, np.float32,
"simple", protocol="grpc")
Validate Over Both Protocols
for protocol in ["http", "grpc"]:
iu.infer_exact(self, "onnx", (4, 4), 1,
np.int32, np.int32,
"simple_onnx", protocol=protocol)