Implementation:Triton inference server Server HTTPServer

Knowledge Sources	Triton Inference Server Triton HTTP/REST API
Domains	HTTP, Inference_Serving
Last Updated	2026-02-13 17:00 GMT

Overview

Concrete tool for serving inference requests over HTTP/REST using libevhtp, implementing the KFServing and Triton-specific protocols.

Description

The HTTPAPIServer class is the core HTTP serving component of Triton. It extends HTTPServer (which wraps libevhtp for event-driven HTTP) and implements the KFServing Predict V2 protocol along with Triton-specific extensions. Key nested classes include InferRequestClass for managing inference request lifecycle, GenerateRequestClass for streaming text generation, and TritonOutput for output buffer allocation. The HTTPMetricsServer subclass handles Prometheus metrics on a separate port.

Usage

Instantiated automatically by Triton's main startup sequence when HTTP endpoints are enabled. Not directly imported by external users. Cloud platform servers (SageMaker, Vertex AI) extend this class.

Code Reference

Source Location

Repository: Triton Inference Server
File: src/http_server.h
Lines: 1-722

Signature

class HTTPServer {
 public:
  static TRITONSERVER_Error* CreateHTTPServer(
      const std::string& addr, int port, int thread_cnt,
      std::unique_ptr<HTTPServer>* server);
  TRITONSERVER_Error* Start();
  TRITONSERVER_Error* Stop();
  virtual ~HTTPServer();
};

class HTTPAPIServer : public HTTPServer {
 public:
  static TRITONSERVER_Error* Create(
      const std::shared_ptr<TRITONSERVER_Server>& server,
      triton::server::TraceManager* trace_manager,
      const std::shared_ptr<SharedMemoryManager>& smm,
      const TritonServerParameters& params,
      int32_t port, int thread_cnt,
      std::unique_ptr<HTTPServer>* http_server);
  // KFServing V2 handlers
  void HandleInfer(evhtp_request_t* req, ...);
  void HandleHealth(evhtp_request_t* req, ...);
  void HandleModelReady(evhtp_request_t* req, ...);
  void HandleMetadata(evhtp_request_t* req, ...);
  void HandleGenerate(evhtp_request_t* req, ...);
};

Import

#include "http_server.h"

I/O Contract

Inputs

Name	Type	Required	Description
server	TRITONSERVER_Server	Yes	Triton core server instance
trace_manager	TraceManager*	No	Trace manager for request tracing
smm	SharedMemoryManager	No	Shared memory manager for zero-copy
params	TritonServerParameters	Yes	Server configuration parameters
port	int32_t	Yes	HTTP port to listen on
thread_cnt	int	Yes	Number of event loop threads

Outputs

Name	Type	Description
HTTP responses	evhtp_request_t	KFServing V2 JSON/binary responses
Metrics	HTTP	Prometheus metrics on metrics port

Usage Examples

Server Startup (Internal)

#include "http_server.h"

// Create and start HTTP API server
std::unique_ptr<HTTPServer> http_server;
TRITONSERVER_Error* err = HTTPAPIServer::Create(
    triton_server, trace_manager, shared_memory_manager,
    params, 8000 /* port */, 8 /* threads */,
    &http_server);
if (err == nullptr) {
  http_server->Start();
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment