Implementation:Triton inference server Server HTTP Health Endpoint
| Knowledge Sources | |
|---|---|
| Domains | MLOps, Observability, HTTP_API |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Concrete HTTP endpoint handler for server liveness and readiness checks in Triton Inference Server.
Description
The HandleServerHealth method in HTTPAPIServer processes GET requests to the /v2/health/live and /v2/health/ready endpoints. It delegates to the TRITONSERVER C API functions TRITONSERVER_ServerIsLive and TRITONSERVER_ServerIsReady to determine the server's state, returning HTTP 200 for healthy or HTTP 400 for not ready.
Usage
Call these endpoints using curl or any HTTP client immediately after starting Triton to verify the server is ready. Use as Kubernetes liveness and readiness probes in production deployments.
Code Reference
Source Location
- Repository: triton-inference-server/server
- File: src/http_server.cc
- Lines: L1355-1376 (HandleServerHealth implementation)
Signature
// src/http_server.cc:L1355
void
HTTPAPIServer::HandleServerHealth(evhtp_request_t* req, const std::string& kind)
{
// kind == "live" → TRITONSERVER_ServerIsLive(server_.get(), &ready)
// kind == "ready" → TRITONSERVER_ServerIsReady(server_.get(), &ready)
// Returns: HTTP 200 (ready=true) or HTTP 400 (ready=false)
}
# HTTP API
GET /v2/health/live # Server liveness
GET /v2/health/ready # Server readiness
GET /v2/models/<model_name>/ready # Model-specific readiness
Import
# Client-side usage (no import needed, standard HTTP):
curl -v localhost:8000/v2/health/ready
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| kind | string (URL path) | Yes | "live" or "ready" (from URL path) |
| model_name | string (URL path) | No | For model-specific readiness: /v2/models/<name>/ready |
Outputs
| Name | Type | Description |
|---|---|---|
| HTTP status | int | 200 (healthy/ready) or 400 (not ready) |
| response body | empty | No body content for health endpoints |
Usage Examples
Basic Health Check
# Check server readiness
curl -v localhost:8000/v2/health/ready
# HTTP/1.1 200 OK → Server is ready
# Check server liveness
curl -v localhost:8000/v2/health/live
# HTTP/1.1 200 OK → Server process is running
Model-Specific Readiness
# Check if a specific model is loaded and ready
curl -v localhost:8000/v2/models/densenet_onnx/ready
# HTTP/1.1 200 OK → Model is ready for inference
Kubernetes Probe Configuration
# In Kubernetes Deployment spec
livenessProbe:
httpGet:
path: /v2/health/live
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
readinessProbe:
httpGet:
path: /v2/health/ready
port: 8000
initialDelaySeconds: 30
periodSeconds: 10