Implementation:Triton inference server Server SharedMemoryManager

Knowledge Sources	Triton Inference Server Shared Memory Extension
Domains	Memory_Management, IPC
Last Updated	2026-02-13 17:00 GMT

Overview

Concrete tool for managing registration and access to system (POSIX) and CUDA shared memory regions for zero-copy inference data transfer.

Description

The SharedMemoryManager class provides thread-safe management of named shared memory regions used for zero-copy tensor data transfer between client processes and Triton. It supports both POSIX shared memory (via shm_open/mmap) and CUDA IPC memory handles. The manager maintains a registry of SharedMemoryInfo and CUDASharedMemoryInfo structs, with reference-counted access tracking that prevents premature unregistration while inference requests are in flight.

Usage

Used internally by Triton's HTTP and gRPC endpoints when clients register shared memory regions for inference input/output. Clients use the shared memory extension API to register regions, then reference them in inference requests.

Code Reference

Source Location

Repository: Triton Inference Server
File: src/shared_memory_manager.h
Lines: 1-202
File: src/shared_memory_manager.cc
Lines: 1-735

Signature

class SharedMemoryManager {
 public:
  SharedMemoryManager() = default;
  ~SharedMemoryManager();

  // System shared memory
  TRITONSERVER_Error* RegisterSystemSharedMemory(
      const std::string& name, const std::string& shm_key,
      size_t offset, size_t byte_size);

  // CUDA shared memory
  TRITONSERVER_Error* RegisterCUDASharedMemory(
      const std::string& name, const cudaIpcMemHandle_t* cuda_shm_handle,
      size_t byte_size, int device_id);

  // Query
  TRITONSERVER_Error* GetMemoryInfo(
      const std::string& name, size_t offset, size_t byte_size,
      void** shm_mapped_addr,
      TRITONSERVER_MemoryType* memory_type, int64_t* device_id);

  // Unregister
  TRITONSERVER_Error* Unregister(const std::string& name);
  TRITONSERVER_Error* UnregisterAll();

  // Status
  TRITONSERVER_Error* GetStatus(
      const std::string& name, triton::common::TritonJson::Value* shm_status);

 private:
  struct SharedMemoryInfo { /* name, key, offset, size, mapped_addr */ };
  struct CUDASharedMemoryInfo { /* device_id, cuda_ipc_handle */ };
  std::mutex mu_;
  std::map<std::string, std::shared_ptr<SharedMemoryInfo>> shared_memory_map_;
};

Import

#include "shared_memory_manager.h"

I/O Contract

Inputs

Name	Type	Required	Description
name	string	Yes	Unique name for the shared memory region
shm_key	string	Yes (system)	POSIX shared memory key
cuda_shm_handle	cudaIpcMemHandle_t	Yes (CUDA)	CUDA IPC memory handle
byte_size	size_t	Yes	Size of shared memory region in bytes
device_id	int	Yes (CUDA)	GPU device for CUDA shared memory

Outputs

Name	Type	Description
shm_mapped_addr	void*	Mapped address for data access
memory_type	TRITONSERVER_MemoryType	CPU or GPU memory type
shm_status	JSON	Region metadata for status queries

Usage Examples

Register System Shared Memory

#include "shared_memory_manager.h"

SharedMemoryManager smm;

// Register a POSIX shared memory region
auto err = smm.RegisterSystemSharedMemory(
    "input_region",    // name
    "/triton_shm_0",  // shm_key
    0,                 // offset
    1024 * 1024        // byte_size (1 MB)
);

// Get mapped address for data transfer
void* addr;
TRITONSERVER_MemoryType mem_type;
int64_t dev_id;
err = smm.GetMemoryInfo("input_region", 0, 1024, &addr, &mem_type, &dev_id);

Related Pages

Environment:Triton_inference_server_Server_GPU_CUDA_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment