Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server SharedMemoryManager

From Leeroopedia
Revision as of 13:59, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Triton_inference_server_Server_SharedMemoryManager.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Memory_Management, IPC
Last Updated 2026-02-13 17:00 GMT

Overview

Concrete tool for managing registration and access to system (POSIX) and CUDA shared memory regions for zero-copy inference data transfer.

Description

The SharedMemoryManager class provides thread-safe management of named shared memory regions used for zero-copy tensor data transfer between client processes and Triton. It supports both POSIX shared memory (via shm_open/mmap) and CUDA IPC memory handles. The manager maintains a registry of SharedMemoryInfo and CUDASharedMemoryInfo structs, with reference-counted access tracking that prevents premature unregistration while inference requests are in flight.

Usage

Used internally by Triton's HTTP and gRPC endpoints when clients register shared memory regions for inference input/output. Clients use the shared memory extension API to register regions, then reference them in inference requests.

Code Reference

Source Location

Signature

class SharedMemoryManager {
 public:
  SharedMemoryManager() = default;
  ~SharedMemoryManager();

  // System shared memory
  TRITONSERVER_Error* RegisterSystemSharedMemory(
      const std::string& name, const std::string& shm_key,
      size_t offset, size_t byte_size);

  // CUDA shared memory
  TRITONSERVER_Error* RegisterCUDASharedMemory(
      const std::string& name, const cudaIpcMemHandle_t* cuda_shm_handle,
      size_t byte_size, int device_id);

  // Query
  TRITONSERVER_Error* GetMemoryInfo(
      const std::string& name, size_t offset, size_t byte_size,
      void** shm_mapped_addr,
      TRITONSERVER_MemoryType* memory_type, int64_t* device_id);

  // Unregister
  TRITONSERVER_Error* Unregister(const std::string& name);
  TRITONSERVER_Error* UnregisterAll();

  // Status
  TRITONSERVER_Error* GetStatus(
      const std::string& name, triton::common::TritonJson::Value* shm_status);

 private:
  struct SharedMemoryInfo { /* name, key, offset, size, mapped_addr */ };
  struct CUDASharedMemoryInfo { /* device_id, cuda_ipc_handle */ };
  std::mutex mu_;
  std::map<std::string, std::shared_ptr<SharedMemoryInfo>> shared_memory_map_;
};

Import

#include "shared_memory_manager.h"

I/O Contract

Inputs

Name Type Required Description
name string Yes Unique name for the shared memory region
shm_key string Yes (system) POSIX shared memory key
cuda_shm_handle cudaIpcMemHandle_t Yes (CUDA) CUDA IPC memory handle
byte_size size_t Yes Size of shared memory region in bytes
device_id int Yes (CUDA) GPU device for CUDA shared memory

Outputs

Name Type Description
shm_mapped_addr void* Mapped address for data access
memory_type TRITONSERVER_MemoryType CPU or GPU memory type
shm_status JSON Region metadata for status queries

Usage Examples

Register System Shared Memory

#include "shared_memory_manager.h"

SharedMemoryManager smm;

// Register a POSIX shared memory region
auto err = smm.RegisterSystemSharedMemory(
    "input_region",    // name
    "/triton_shm_0",  // shm_key
    0,                 // offset
    1024 * 1024        // byte_size (1 MB)
);

// Get mapped address for data transfer
void* addr;
TRITONSERVER_MemoryType mem_type;
int64_t dev_id;
err = smm.GetMemoryInfo("input_region", 0, 1024, &addr, &mem_type, &dev_id);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment