Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server HTTPServer

From Leeroopedia
Knowledge Sources
Domains HTTP, Inference_Serving
Last Updated 2026-02-13 17:00 GMT

Overview

Concrete tool for serving inference requests over HTTP/REST using libevhtp, implementing the KFServing and Triton-specific protocols.

Description

The HTTPAPIServer class is the core HTTP serving component of Triton. It extends HTTPServer (which wraps libevhtp for event-driven HTTP) and implements the KFServing Predict V2 protocol along with Triton-specific extensions. Key nested classes include InferRequestClass for managing inference request lifecycle, GenerateRequestClass for streaming text generation, and TritonOutput for output buffer allocation. The HTTPMetricsServer subclass handles Prometheus metrics on a separate port.

Usage

Instantiated automatically by Triton's main startup sequence when HTTP endpoints are enabled. Not directly imported by external users. Cloud platform servers (SageMaker, Vertex AI) extend this class.

Code Reference

Source Location

Signature

class HTTPServer {
 public:
  static TRITONSERVER_Error* CreateHTTPServer(
      const std::string& addr, int port, int thread_cnt,
      std::unique_ptr<HTTPServer>* server);
  TRITONSERVER_Error* Start();
  TRITONSERVER_Error* Stop();
  virtual ~HTTPServer();
};

class HTTPAPIServer : public HTTPServer {
 public:
  static TRITONSERVER_Error* Create(
      const std::shared_ptr<TRITONSERVER_Server>& server,
      triton::server::TraceManager* trace_manager,
      const std::shared_ptr<SharedMemoryManager>& smm,
      const TritonServerParameters& params,
      int32_t port, int thread_cnt,
      std::unique_ptr<HTTPServer>* http_server);
  // KFServing V2 handlers
  void HandleInfer(evhtp_request_t* req, ...);
  void HandleHealth(evhtp_request_t* req, ...);
  void HandleModelReady(evhtp_request_t* req, ...);
  void HandleMetadata(evhtp_request_t* req, ...);
  void HandleGenerate(evhtp_request_t* req, ...);
};

Import

#include "http_server.h"

I/O Contract

Inputs

Name Type Required Description
server TRITONSERVER_Server Yes Triton core server instance
trace_manager TraceManager* No Trace manager for request tracing
smm SharedMemoryManager No Shared memory manager for zero-copy
params TritonServerParameters Yes Server configuration parameters
port int32_t Yes HTTP port to listen on
thread_cnt int Yes Number of event loop threads

Outputs

Name Type Description
HTTP responses evhtp_request_t KFServing V2 JSON/binary responses
Metrics HTTP Prometheus metrics on metrics port

Usage Examples

Server Startup (Internal)

#include "http_server.h"

// Create and start HTTP API server
std::unique_ptr<HTTPServer> http_server;
TRITONSERVER_Error* err = HTTPAPIServer::Create(
    triton_server, trace_manager, shared_memory_manager,
    params, 8000 /* port */, 8 /* threads */,
    &http_server);
if (err == nullptr) {
  http_server->Start();
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment