Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server ShmUtil

From Leeroopedia
Knowledge Sources
Domains Testing, Memory_Management
Last Updated 2026-02-13 17:00 GMT

Overview

Shared memory utility for creating, managing, and cleaning up system and CUDA shared memory regions used in inference tests.

Description

The `shm_util.py` module provides helper functions for working with both system (POSIX) shared memory and CUDA shared memory in Triton QA tests. It wraps the Triton client library's shared memory APIs to create shared memory regions, populate them with input tensor data, register them with the Triton server, and extract output data from them after inference. The module handles the lifecycle of shared memory handles, including proper cleanup to prevent resource leaks. It is used by QA tests that validate Triton's zero-copy shared memory inference path.

Usage

Import this module in QA tests that need to test shared memory inference. Use the provided functions to create, register, populate, and destroy shared memory regions around inference calls.

Code Reference

Source Location

Signature

def create_set_shm_regions(input_list, output_list, shm_region_names,
                           shm_region_byte_sizes, triton_client, protocol): ...
def create_set_cuda_shm_regions(input_list, output_list, shm_region_names,
                                shm_region_byte_sizes, triton_client, protocol): ...
def unregister_cleanup_shm_regions(shm_handles, triton_client, protocol): ...
def cleanup_shm_regions(shm_handles): ...

Import

import sys
sys.path.insert(0, "/path/to/qa/common")
import shm_util

I/O Contract

Inputs

Name Type Required Description
input_list list[numpy.ndarray] Yes Input tensor data to place in shared memory
output_list list[tuple] Yes Output tensor names and byte sizes for output shared memory allocation
shm_region_names list[string] Yes Names for the shared memory regions to create
shm_region_byte_sizes list[int] Yes Byte sizes for each shared memory region
triton_client InferenceServerClient Yes Triton client instance for registering shared memory
protocol string Yes Protocol in use: "http" or "grpc"

Outputs

Name Type Description
shm_handles list[handle] Handles to created shared memory regions for later cleanup
input_shm_regions list[SharedMemoryRegion] Registered input shared memory regions ready for inference
output_shm_regions list[SharedMemoryRegion] Registered output shared memory regions for receiving results

Usage Examples

System Shared Memory Inference

import shm_util
shm_handles = shm_util.create_set_shm_regions(
    inputs, outputs, ["input0_shm", "output0_shm"],
    [input_byte_size, output_byte_size], client, "http")
# ... run inference ...
shm_util.unregister_cleanup_shm_regions(shm_handles, client, "http")

CUDA Shared Memory Inference

shm_handles = shm_util.create_set_cuda_shm_regions(
    inputs, outputs, ["input0_cudashm", "output0_cudashm"],
    [input_byte_size, output_byte_size], client, "grpc")
# ... run inference ...
shm_util.unregister_cleanup_shm_regions(shm_handles, client, "grpc")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment