Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Triton inference server Server QA Shell Test Infrastructure

From Leeroopedia
Revision as of 17:27, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Triton_inference_server_Server_QA_Shell_Test_Infrastructure.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Overview

QA Shell Test Infrastructure provides the common Bash shell utility functions that underpin the orchestration of nearly every QA test in the Triton Inference Server test suite. The util.sh library, sourced by individual test scripts via source ../common/util.sh, standardizes server lifecycle management (start, health check, stop), test result verification, process monitoring, error detection, and debugging support (GDB backtrace capture on hangs or segfaults). This infrastructure ensures that all QA tests follow consistent patterns for server interaction, failure reporting, and resource cleanup.

Theoretical Basis

Shell-based test orchestration is the natural integration layer for testing a server process like Triton, which is started as a separate OS process and interacted with over network protocols. While the inference validation logic is written in Python (using the Triton client libraries), the test lifecycle management, including server process startup, health polling, log capture, and process cleanup, is most naturally expressed in shell scripts that can directly invoke the server binary, manage PID files, and inspect process state.

The design of this infrastructure reflects several testing engineering principles:

Deterministic server lifecycle: The run_server function starts the Triton server binary with configurable arguments ($SERVER_ARGS), captures its PID in $SERVER_PID, and polls the health endpoint until the server reports ready or a timeout expires. This eliminates race conditions where test scripts attempt inference before the server has finished loading models. The wait_for_server_ready function polls http://${SERVER_IPADDR}:8000/v2/health/ready with a configurable timeout (default 120 seconds), checking both the HTTP response code and that the server process is still alive (via kill -0). If the server process exits before becoming ready, the function returns immediately rather than waiting for the full timeout.

Server liveness vs. readiness distinction: The infrastructure distinguishes between server liveness (/v2/health/live, indicating the process is running and accepting connections) and server readiness (/v2/health/ready, indicating all models are loaded and inference is available). The wait_for_server_live function is used by tests that need to interact with the server's management API before models are loaded.

Model stability waiting: The wait_for_model_stable function polls the model repository index endpoint, counting models in transitional states (loading, unloading) and waiting until all models reach a terminal state (MODEL_READY or MODEL_UNAVAILABLE). This is critical for tests that exercise dynamic model loading, where a model load request returns immediately and the test must wait for the asynchronous load to complete.

File-based event waiting: The wait_for_file_str function monitors a file (typically a server log) for the appearance of a specific string, using tail -F to follow the file as it grows. This enables tests to synchronize on server-side events that are not exposed through the health API, such as specific log messages indicating cache initialization or model-specific state transitions.

Crash diagnostics: The gdb_helper function provides automated crash analysis. If the server process is still alive after a test timeout (indicating a hang), it attaches GDB to capture a full thread backtrace and generates a core dump for offline analysis. If core dump files from a segfault are found, it loads each one with GDB to produce backtrace logs. This transforms opaque CI failures ("test timed out") into actionable debugging information.

Leak detection support: The run_server_leakcheck variant starts the server under Valgrind with massif (heap profiler) or memcheck (memory error detector) instrumentation, enabling memory-focused tests to analyze allocation patterns and detect leaks. The Valgrind arguments, output files, and maximum thread count are configurable via $LEAKCHECK_ARGS.

Test result verification: The check_test_results function reads a test result file produced by Python test scripts and verifies that the expected number of tests passed. This provides a second layer of validation beyond the Python test runner's exit code, catching cases where the test runner exits 0 but produces fewer passing tests than expected.

Configurable server address: The SERVER_IPADDR variable (defaulting to localhost, overridable via TRITONSERVER_IPADDR environment variable) allows the same test scripts to target both local and remote server instances, supporting distributed test execution topologies.

Implementation Details

The util.sh script is located at qa/common/util.sh and is sourced (not executed) by test scripts, injecting its functions into the test script's shell environment. Key exported variables include SERVER_PID (set by run_server), WAIT_RET (set by wait functions), and SERVER_LOG (configurable log output path). The script uses set +e/set -e toggling to allow health check curl commands to fail without aborting the script, while maintaining strict error checking for other operations.

Related Pages

Implementation:Triton_inference_server_Server_QaUtilSh Triton_inference_server_Server

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment