Implementation:Triton inference server Server MemoryAllocTest

Knowledge Sources	Triton Inference Server
Domains	Memory_Management, Testing
Last Updated	2026-02-13 17:00 GMT

Overview

Test executable for validating Triton's GPU and CPU memory allocation behavior during in-process inference.

Description

memory_alloc.cc is a standalone test executable that creates an in-process Triton server instance, loads a model, and runs inference requests with configurable input/output memory types (CPU, GPU, or specific GPU device). It implements custom response allocator callbacks for device memory allocation, validates outputs against expected values, and tests host policy configurations for multi-GPU setups.

Usage

Used as a QA test executable to verify that Triton correctly allocates and transfers data across CPU and GPU memory, especially in multi-GPU environments. Not a library for import.

Code Reference

Source Location

Repository: Triton Inference Server
File: src/memory_alloc.cc
Lines: 1-968

Signature

// Custom allocator callbacks
TRITONSERVER_Error* ResponseAlloc(
    TRITONSERVER_ResponseAllocator* allocator,
    const char* tensor_name, size_t byte_size,
    TRITONSERVER_MemoryType preferred_memory_type,
    int64_t preferred_memory_type_id,
    void* userp, void** buffer,
    void** buffer_userp,
    TRITONSERVER_MemoryType* actual_memory_type,
    int64_t* actual_memory_type_id);

TRITONSERVER_Error* ResponseRelease(
    TRITONSERVER_ResponseAllocator* allocator,
    void* buffer, void* buffer_userp,
    size_t byte_size,
    TRITONSERVER_MemoryType memory_type,
    int64_t memory_type_id);

int main(int argc, char** argv);

Import

// Standalone executable - no import needed
// Build via CMakeLists.txt target

I/O Contract

Inputs

Name	Type	Required	Description
argv[1]	string	Yes	Path to model repository
argv[2]	string	Yes	Input memory type (system/pinned/gpu)
argv[3]	string	Yes	Output memory type (system/pinned/gpu)

Outputs

Name	Type	Description
exit code	int	0 on success, non-zero on failure
stdout	text	Validation results and error messages

Usage Examples

Running Memory Allocation Test

# Test with GPU input and CPU output
./memory_alloc /path/to/model_repository gpu system

# Test with pinned memory
./memory_alloc /path/to/model_repository pinned pinned

Related Pages

Environment:Triton_inference_server_Server_GPU_CUDA_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment