Implementation:Triton inference server Server ShmUtil
| Knowledge Sources | |
|---|---|
| Domains | Testing, Memory_Management |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Shared memory utility for creating, managing, and cleaning up system and CUDA shared memory regions used in inference tests.
Description
The `shm_util.py` module provides helper functions for working with both system (POSIX) shared memory and CUDA shared memory in Triton QA tests. It wraps the Triton client library's shared memory APIs to create shared memory regions, populate them with input tensor data, register them with the Triton server, and extract output data from them after inference. The module handles the lifecycle of shared memory handles, including proper cleanup to prevent resource leaks. It is used by QA tests that validate Triton's zero-copy shared memory inference path.
Usage
Import this module in QA tests that need to test shared memory inference. Use the provided functions to create, register, populate, and destroy shared memory regions around inference calls.
Code Reference
Source Location
- Repository: Triton Inference Server
- File: qa/common/shm_util.py
- Lines: 1-490
Signature
def create_set_shm_regions(input_list, output_list, shm_region_names,
shm_region_byte_sizes, triton_client, protocol): ...
def create_set_cuda_shm_regions(input_list, output_list, shm_region_names,
shm_region_byte_sizes, triton_client, protocol): ...
def unregister_cleanup_shm_regions(shm_handles, triton_client, protocol): ...
def cleanup_shm_regions(shm_handles): ...
Import
import sys
sys.path.insert(0, "/path/to/qa/common")
import shm_util
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input_list | list[numpy.ndarray] | Yes | Input tensor data to place in shared memory |
| output_list | list[tuple] | Yes | Output tensor names and byte sizes for output shared memory allocation |
| shm_region_names | list[string] | Yes | Names for the shared memory regions to create |
| shm_region_byte_sizes | list[int] | Yes | Byte sizes for each shared memory region |
| triton_client | InferenceServerClient | Yes | Triton client instance for registering shared memory |
| protocol | string | Yes | Protocol in use: "http" or "grpc" |
Outputs
| Name | Type | Description |
|---|---|---|
| shm_handles | list[handle] | Handles to created shared memory regions for later cleanup |
| input_shm_regions | list[SharedMemoryRegion] | Registered input shared memory regions ready for inference |
| output_shm_regions | list[SharedMemoryRegion] | Registered output shared memory regions for receiving results |
Usage Examples
import shm_util
shm_handles = shm_util.create_set_shm_regions(
inputs, outputs, ["input0_shm", "output0_shm"],
[input_byte_size, output_byte_size], client, "http")
# ... run inference ...
shm_util.unregister_cleanup_shm_regions(shm_handles, client, "http")
shm_handles = shm_util.create_set_cuda_shm_regions(
inputs, outputs, ["input0_cudashm", "output0_cudashm"],
[input_byte_size, output_byte_size], client, "grpc")
# ... run inference ...
shm_util.unregister_cleanup_shm_regions(shm_handles, client, "grpc")