Implementation:Triton inference server Server SharedMemoryManager
| Knowledge Sources | |
|---|---|
| Domains | Memory_Management, IPC |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Concrete tool for managing registration and access to system (POSIX) and CUDA shared memory regions for zero-copy inference data transfer.
Description
The SharedMemoryManager class provides thread-safe management of named shared memory regions used for zero-copy tensor data transfer between client processes and Triton. It supports both POSIX shared memory (via shm_open/mmap) and CUDA IPC memory handles. The manager maintains a registry of SharedMemoryInfo and CUDASharedMemoryInfo structs, with reference-counted access tracking that prevents premature unregistration while inference requests are in flight.
Usage
Used internally by Triton's HTTP and gRPC endpoints when clients register shared memory regions for inference input/output. Clients use the shared memory extension API to register regions, then reference them in inference requests.
Code Reference
Source Location
- Repository: Triton Inference Server
- File: src/shared_memory_manager.h
- Lines: 1-202
- File: src/shared_memory_manager.cc
- Lines: 1-735
Signature
class SharedMemoryManager {
public:
SharedMemoryManager() = default;
~SharedMemoryManager();
// System shared memory
TRITONSERVER_Error* RegisterSystemSharedMemory(
const std::string& name, const std::string& shm_key,
size_t offset, size_t byte_size);
// CUDA shared memory
TRITONSERVER_Error* RegisterCUDASharedMemory(
const std::string& name, const cudaIpcMemHandle_t* cuda_shm_handle,
size_t byte_size, int device_id);
// Query
TRITONSERVER_Error* GetMemoryInfo(
const std::string& name, size_t offset, size_t byte_size,
void** shm_mapped_addr,
TRITONSERVER_MemoryType* memory_type, int64_t* device_id);
// Unregister
TRITONSERVER_Error* Unregister(const std::string& name);
TRITONSERVER_Error* UnregisterAll();
// Status
TRITONSERVER_Error* GetStatus(
const std::string& name, triton::common::TritonJson::Value* shm_status);
private:
struct SharedMemoryInfo { /* name, key, offset, size, mapped_addr */ };
struct CUDASharedMemoryInfo { /* device_id, cuda_ipc_handle */ };
std::mutex mu_;
std::map<std::string, std::shared_ptr<SharedMemoryInfo>> shared_memory_map_;
};
Import
#include "shared_memory_manager.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| name | string | Yes | Unique name for the shared memory region |
| shm_key | string | Yes (system) | POSIX shared memory key |
| cuda_shm_handle | cudaIpcMemHandle_t | Yes (CUDA) | CUDA IPC memory handle |
| byte_size | size_t | Yes | Size of shared memory region in bytes |
| device_id | int | Yes (CUDA) | GPU device for CUDA shared memory |
Outputs
| Name | Type | Description |
|---|---|---|
| shm_mapped_addr | void* | Mapped address for data access |
| memory_type | TRITONSERVER_MemoryType | CPU or GPU memory type |
| shm_status | JSON | Region metadata for status queries |
Usage Examples
#include "shared_memory_manager.h"
SharedMemoryManager smm;
// Register a POSIX shared memory region
auto err = smm.RegisterSystemSharedMemory(
"input_region", // name
"/triton_shm_0", // shm_key
0, // offset
1024 * 1024 // byte_size (1 MB)
);
// Get mapped address for data transfer
void* addr;
TRITONSERVER_MemoryType mem_type;
int64_t dev_id;
err = smm.GetMemoryInfo("input_region", 0, 1024, &addr, &mem_type, &dev_id);