Implementation:Triton inference server Server MultiServerTest
| Knowledge Sources | |
|---|---|
| Domains | Concurrency, Testing |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Test executable for validating Triton's concurrent multi-threaded inference using the in-process C API.
Description
multi_server.cc creates an in-process Triton server, loads a model, and spawns multiple concurrent inference threads to validate thread safety. Each thread independently sends inference requests using configurable memory types (system, pinned, GPU) with asynchronous inference using promises/futures for synchronization. The program validates that concurrent requests produce correct results.
Usage
Used as a QA test executable to verify Triton's thread safety and concurrent inference capability. Not a library for import.
Code Reference
Source Location
- Repository: Triton Inference Server
- File: src/multi_server.cc
- Lines: 1-1001
Signature
// Custom allocator callbacks (similar to memory_alloc.cc)
TRITONSERVER_Error* ResponseAlloc(...);
TRITONSERVER_Error* ResponseRelease(...);
void InferResponseComplete(
TRITONSERVER_InferenceResponse* response,
const uint32_t flags, void* userp);
int main(int argc, char** argv);
Import
// Standalone executable - no import needed
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| argv[1] | string | Yes | Path to model repository |
| argv[2] | string | No | Memory type (system/pinned/gpu) |
Outputs
| Name | Type | Description |
|---|---|---|
| exit code | int | 0 on success, non-zero on failure |
| stdout | text | Per-thread validation results |
Usage Examples
Running Multi-threaded Inference Test
# Test with system memory
./multi_server /path/to/model_repository system
# Test with GPU memory
./multi_server /path/to/model_repository gpu