Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server InferUtil

From Leeroopedia
Knowledge Sources
Domains Testing, Inference
Last Updated 2026-02-13 17:00 GMT

Overview

Core inference utility library providing helper functions for sending inference requests and validating results across all QA tests.

Description

The `infer_util.py` module is the central testing utility for Triton QA, offering functions to construct inference requests, send them via both HTTP and gRPC protocols, and validate the returned results against expected outputs. It supports all data types, batched and non-batched requests, shared memory regions, binary tensor data, and various output validation strategies including exact match, approximate comparison, and shape-only checks. Nearly every Python-based QA test imports this module to avoid duplicating request construction and validation logic.

Usage

Import this module in any QA test script that needs to send inference requests to a running Triton server and validate the results. It abstracts away protocol differences between HTTP and gRPC clients.

Code Reference

Source Location

Signature

def infer_exact(tester, pf, tensor_shape, batch_size, input_dtype, output_dtype,
                model_name, protocol="http", swap=False, timeout_us=0): ...

def infer_zero(tester, pf, batch_size, tensor_dtype, input_shapes, output_shapes,
               model_name, protocol="http"): ...

def infer_shape_tensor(tester, pf, tensor_shape, batch_size, dtype,
                       input_shapes, dummy_input_shapes, model_name, protocol="http"): ...

def validate_for_tf_model(expected_dtype, output_dtype, proto_dtype, output_data, expected_data): ...
def validate_for_onnx_model(expected_dtype, output_dtype, output_data, expected_data): ...

Import

import sys
sys.path.insert(0, "/path/to/qa/common")
import infer_util as iu

I/O Contract

Inputs

Name Type Required Description
tester unittest.TestCase Yes Test case instance for assertions
model_name string Yes Name of the model to send requests to
tensor_shape list[int] Yes Shape of the input tensors
batch_size int Yes Number of requests in the batch
input_dtype numpy.dtype Yes Data type of the input tensors
protocol string No Protocol to use: "http" or "grpc" (default: "http")
timeout_us int No Request timeout in microseconds (0 for no timeout)

Outputs

Name Type Description
assertion_results None Raises AssertionError via unittest if validation fails
infer_result InferResult Raw inference result object when using lower-level functions

Usage Examples

Basic Exact Inference Validation

import infer_util as iu
iu.infer_exact(self, "graphdef", (16,), 8,
               np.float32, np.float32,
               "simple", protocol="grpc")

Validate Over Both Protocols

for protocol in ["http", "grpc"]:
    iu.infer_exact(self, "onnx", (4, 4), 1,
                   np.int32, np.int32,
                   "simple_onnx", protocol=protocol)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment