Implementation:Triton inference server Server Container Health Check
| Field | Value |
|---|---|
| Page Type | Implementation |
| Title | Container_Health_Check |
| Namespace | Triton_inference_server_Server |
| Workflow | Custom_Container_Build |
| Domains | Quality_Assurance, Container_Build |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Concrete Docker run and curl-based verification procedure for custom Triton containers.
Description
Container health check verification involves launching the custom-built Triton container with Docker, exposing the standard service ports, and using HTTP requests to confirm the server is healthy and responsive. The procedure validates binary integrity, library linking, backend loading, and endpoint binding in a single operational test.
The verification process follows these steps:
- Launch the container using
docker runwith port mappings for HTTP (8000), gRPC (8001), and metrics (8002) - Wait for server startup by monitoring container logs or polling the health endpoint
- Query the health endpoint at
/v2/health/readyto confirm the server is ready to accept inference requests - Inspect server logs to verify all requested backends loaded without errors
- Optionally query the metrics endpoint at port 8002 to confirm Prometheus metrics are available
The health endpoint follows the KServe V2 inference protocol specification, returning HTTP 200 when the server is ready and HTTP 503 when it is still initializing or unhealthy.
Usage
Basic Health Check
# Launch the container (with --rm for automatic cleanup)
docker run --rm -d \
-p 8000:8000 \
-p 8001:8001 \
-p 8002:8002 \
--name triton-verify \
tritonserver \
tritonserver --model-repository=/models
# Wait for the server to start (poll health endpoint)
for i in $(seq 1 30); do
if curl -s -o /dev/null -w "%{http_code}" localhost:8000/v2/health/ready | grep -q "200"; then
echo "Server is ready"
break
fi
echo "Waiting for server... ($i/30)"
sleep 2
done
# Verify health endpoint
curl -v localhost:8000/v2/health/ready
# Check server metadata
curl -s localhost:8000/v2 | python3 -m json.tool
# Stop the verification container
docker stop triton-verify
GPU-Enabled Health Check
# Launch with GPU access
docker run --rm -d \
--gpus all \
-p 8000:8000 \
-p 8001:8001 \
-p 8002:8002 \
--name triton-verify \
tritonserver \
tritonserver --model-repository=/models
# Verify health
curl -v localhost:8000/v2/health/ready
Code Reference
Source Location
| File | Lines | Description |
|---|---|---|
src/main.cc |
L439-511 | main() -- Server entry point that initializes the server and starts all endpoints
|
src/main.cc |
L224-300 | StartEndpoints() -- Initializes and starts HTTP, gRPC, and metrics endpoints
|
src/http_server.cc |
L1355-1371 | HandleServerHealth() -- HTTP handler for the /v2/health/ready and /v2/health/live endpoints
|
Signature
# Launch container
docker run --rm \
-p 8000:8000 \
-p 8001:8001 \
-p 8002:8002 \
<image> \
tritonserver --model-repository=<path>
# Verify health
curl -v localhost:8000/v2/health/ready
Import
No code imports required. This procedure uses:
dockerCLI for container managementcurlfor HTTP health checks- The
tritonserverbinary inside the container
I/O Contract
Inputs
| Input | Type | Description |
|---|---|---|
| Docker image | Container image | The custom-built Triton container to verify (e.g., tritonserver or custom name)
|
| Model repository | Directory/Path | Path to a model repository (can be empty for basic health check; server starts in READY state with --model-control-mode=explicit)
|
| Port mappings | Network config | Standard Triton ports: 8000 (HTTP), 8001 (gRPC), 8002 (metrics) |
Outputs
| Output | Type | Description |
|---|---|---|
| HTTP 200 from health endpoint | HTTP response | Confirms the server started successfully and is ready to accept requests |
| gRPC endpoint active | Network service | gRPC server listening on port 8001 |
| Metrics endpoint active | Network service | Prometheus metrics available on port 8002 at /metrics
|
| Server startup logs | stdout/stderr | Log output showing backend loading status and endpoint initialization |
Verification Endpoints
| Endpoint | Port | Path | Expected Response |
|---|---|---|---|
| HTTP Health (ready) | 8000 | /v2/health/ready |
HTTP 200 when server is ready |
| HTTP Health (live) | 8000 | /v2/health/live |
HTTP 200 when server process is alive |
| HTTP Server Metadata | 8000 | /v2 |
JSON with server name, version, and extensions |
| Prometheus Metrics | 8002 | /metrics |
Prometheus text format with server metrics |
| gRPC Health | 8001 | gRPC health check | gRPC SERVING status |
Usage Examples
Example 1: Quick smoke test after compose build
# After compose.py completes
docker run --rm -d \
--gpus all \
-p 8000:8000 \
--name triton-test \
tritonserver \
tritonserver --model-repository=/models --model-control-mode=explicit
# Wait and check
sleep 10
curl -s localhost:8000/v2/health/ready
# Expected: HTTP 200
docker stop triton-test
Example 2: Detailed verification with log inspection
# Launch in foreground to see logs
docker run --rm \
-p 8000:8000 \
-p 8001:8001 \
-p 8002:8002 \
tritonserver \
tritonserver --model-repository=/models --log-verbose=1
# In another terminal:
curl -v localhost:8000/v2/health/ready
curl -s localhost:8000/v2 | python3 -m json.tool
curl -s localhost:8002/metrics | head -20
Example 3: Automated CI verification script
#!/bin/bash
set -e
IMAGE="${1:-tritonserver}"
CONTAINER_NAME="triton-ci-verify"
# Launch container
docker run --rm -d \
-p 8000:8000 \
--name "$CONTAINER_NAME" \
"$IMAGE" \
tritonserver --model-repository=/models --model-control-mode=explicit
# Poll health endpoint with timeout
TIMEOUT=60
ELAPSED=0
while [ $ELAPSED -lt $TIMEOUT ]; do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" localhost:8000/v2/health/ready 2>/dev/null || echo "000")
if [ "$STATUS" = "200" ]; then
echo "PASS: Server is healthy"
docker stop "$CONTAINER_NAME"
exit 0
fi
sleep 2
ELAPSED=$((ELAPSED + 2))
done
echo "FAIL: Server did not become healthy within ${TIMEOUT}s"
docker logs "$CONTAINER_NAME"
docker stop "$CONTAINER_NAME"
exit 1