Implementation:Triton inference server Server HTTPServer
| Knowledge Sources | |
|---|---|
| Domains | HTTP, Inference_Serving |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Concrete tool for serving inference requests over HTTP/REST using libevhtp, implementing the KFServing and Triton-specific protocols.
Description
The HTTPAPIServer class is the core HTTP serving component of Triton. It extends HTTPServer (which wraps libevhtp for event-driven HTTP) and implements the KFServing Predict V2 protocol along with Triton-specific extensions. Key nested classes include InferRequestClass for managing inference request lifecycle, GenerateRequestClass for streaming text generation, and TritonOutput for output buffer allocation. The HTTPMetricsServer subclass handles Prometheus metrics on a separate port.
Usage
Instantiated automatically by Triton's main startup sequence when HTTP endpoints are enabled. Not directly imported by external users. Cloud platform servers (SageMaker, Vertex AI) extend this class.
Code Reference
Source Location
- Repository: Triton Inference Server
- File: src/http_server.h
- Lines: 1-722
Signature
class HTTPServer {
public:
static TRITONSERVER_Error* CreateHTTPServer(
const std::string& addr, int port, int thread_cnt,
std::unique_ptr<HTTPServer>* server);
TRITONSERVER_Error* Start();
TRITONSERVER_Error* Stop();
virtual ~HTTPServer();
};
class HTTPAPIServer : public HTTPServer {
public:
static TRITONSERVER_Error* Create(
const std::shared_ptr<TRITONSERVER_Server>& server,
triton::server::TraceManager* trace_manager,
const std::shared_ptr<SharedMemoryManager>& smm,
const TritonServerParameters& params,
int32_t port, int thread_cnt,
std::unique_ptr<HTTPServer>* http_server);
// KFServing V2 handlers
void HandleInfer(evhtp_request_t* req, ...);
void HandleHealth(evhtp_request_t* req, ...);
void HandleModelReady(evhtp_request_t* req, ...);
void HandleMetadata(evhtp_request_t* req, ...);
void HandleGenerate(evhtp_request_t* req, ...);
};
Import
#include "http_server.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| server | TRITONSERVER_Server | Yes | Triton core server instance |
| trace_manager | TraceManager* | No | Trace manager for request tracing |
| smm | SharedMemoryManager | No | Shared memory manager for zero-copy |
| params | TritonServerParameters | Yes | Server configuration parameters |
| port | int32_t | Yes | HTTP port to listen on |
| thread_cnt | int | Yes | Number of event loop threads |
Outputs
| Name | Type | Description |
|---|---|---|
| HTTP responses | evhtp_request_t | KFServing V2 JSON/binary responses |
| Metrics | HTTP | Prometheus metrics on metrics port |
Usage Examples
Server Startup (Internal)
#include "http_server.h"
// Create and start HTTP API server
std::unique_ptr<HTTPServer> http_server;
TRITONSERVER_Error* err = HTTPAPIServer::Create(
triton_server, trace_manager, shared_memory_manager,
params, 8000 /* port */, 8 /* threads */,
&http_server);
if (err == nullptr) {
http_server->Start();
}