Implementation:Triton inference server Server HTTP Health Endpoint

Knowledge Sources	Triton Server KServe V2 Protocol
Domains	MLOps, Observability, HTTP_API
Last Updated	2026-02-13 17:00 GMT

Overview

Concrete HTTP endpoint handler for server liveness and readiness checks in Triton Inference Server.

Description

The HandleServerHealth method in HTTPAPIServer processes GET requests to the /v2/health/live and /v2/health/ready endpoints. It delegates to the TRITONSERVER C API functions TRITONSERVER_ServerIsLive and TRITONSERVER_ServerIsReady to determine the server's state, returning HTTP 200 for healthy or HTTP 400 for not ready.

Usage

Call these endpoints using curl or any HTTP client immediately after starting Triton to verify the server is ready. Use as Kubernetes liveness and readiness probes in production deployments.

Code Reference

Source Location

Repository: triton-inference-server/server
File: src/http_server.cc
Lines: L1355-1376 (HandleServerHealth implementation)

Signature

// src/http_server.cc:L1355
void
HTTPAPIServer::HandleServerHealth(evhtp_request_t* req, const std::string& kind)
{
    // kind == "live"  → TRITONSERVER_ServerIsLive(server_.get(), &ready)
    // kind == "ready" → TRITONSERVER_ServerIsReady(server_.get(), &ready)
    // Returns: HTTP 200 (ready=true) or HTTP 400 (ready=false)
}

# HTTP API
GET /v2/health/live     # Server liveness
GET /v2/health/ready    # Server readiness
GET /v2/models/<model_name>/ready  # Model-specific readiness

Import

# Client-side usage (no import needed, standard HTTP):
curl -v localhost:8000/v2/health/ready

I/O Contract

Inputs

Name	Type	Required	Description
kind	string (URL path)	Yes	"live" or "ready" (from URL path)
model_name	string (URL path)	No	For model-specific readiness: /v2/models/<name>/ready

Outputs

Name	Type	Description
HTTP status	int	200 (healthy/ready) or 400 (not ready)
response body	empty	No body content for health endpoints

Usage Examples

Basic Health Check

# Check server readiness
curl -v localhost:8000/v2/health/ready
# HTTP/1.1 200 OK  →  Server is ready

# Check server liveness
curl -v localhost:8000/v2/health/live
# HTTP/1.1 200 OK  →  Server process is running

Model-Specific Readiness

# Check if a specific model is loaded and ready
curl -v localhost:8000/v2/models/densenet_onnx/ready
# HTTP/1.1 200 OK  →  Model is ready for inference

Kubernetes Probe Configuration

# In Kubernetes Deployment spec
livenessProbe:
  httpGet:
    path: /v2/health/live
    port: 8000
  initialDelaySeconds: 10
  periodSeconds: 5
readinessProbe:
  httpGet:
    path: /v2/health/ready
    port: 8000
  initialDelaySeconds: 30
  periodSeconds: 10

Related Pages

Implements Principle

Principle:Triton_inference_server_Server_Health_Check_API

Requires Environment

Environment:Triton_inference_server_Server_GPU_CUDA_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment