Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server Tritonserver CLI

From Leeroopedia
Knowledge Sources
Domains MLOps, Model_Serving, CLI
Last Updated 2026-02-13 17:00 GMT

Overview

Concrete command-line interface for launching Triton Inference Server, parsing options, loading models, and starting HTTP/gRPC/metrics endpoints.

Description

The tritonserver binary is the main entry point for Triton Inference Server. It parses command-line arguments via TritonParser::Parse(), builds server options via BuildTritonServerOptions(), creates the server instance via TRITONSERVER_ServerNew(), and starts network endpoints via StartEndpoints(). The binary is typically run inside an NVIDIA NGC container.

Usage

Use this command to start Triton Inference Server in any deployment scenario. This is the standard way to launch the server whether for development, testing, or production. For programmatic embedding, use the TRITONSERVER C API directly instead.

Code Reference

Source Location

  • Repository: triton-inference-server/server
  • File: src/main.cc
  • Lines: L439-511 (main function), L224-300 (StartEndpoints)
  • File: src/command_line_parser.cc
  • Lines: L400-405 (--model-repository option), L1017-1036 (BuildTritonServerOptions)

Signature

tritonserver --model-repository=<path> \
    [--http-port=<int>]              \  # Default: 8000
    [--grpc-port=<int>]              \  # Default: 8001
    [--metrics-port=<int>]           \  # Default: 8002
    [--model-control-mode=<string>]  \  # none|poll|explicit (default: none)
    [--log-verbose=<int>]            \  # Verbosity level (default: 0)
    [--exit-on-error=<bool>]         \  # Exit on init error (default: true)
    [--exit-timeout-secs=<int>]      \  # Graceful shutdown timeout
    [--strict-readiness=<bool>]      \  # Strict health check (default: true)
    [--disable-auto-complete-config] \  # Require explicit config.pbtxt
    [--backend-config=<string>]         # Backend-specific config
// Internal C API call sequence (src/main.cc:L488-491)
TRITONSERVER_Server* server_ptr = nullptr;
TRITONSERVER_ServerNew(&server_ptr, triton_options.get());
// Followed by StartEndpoints() at L509

Import

# Binary available in NGC container:
docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 \
    nvcr.io/nvidia/tritonserver:<version>-py3 \
    tritonserver --model-repository=/models

I/O Contract

Inputs

Name Type Required Description
--model-repository string (path) Yes Path to model repository (local, gs://, s3://, as://). Repeatable.
--http-port int No HTTP endpoint port (default: 8000)
--grpc-port int No gRPC endpoint port (default: 8001)
--metrics-port int No Prometheus metrics port (default: 8002)
--model-control-mode string No Model loading strategy: none, poll, explicit (default: none)
--log-verbose int No Logging verbosity level (default: 0)

Outputs

Name Type Description
HTTP endpoint TCP socket KServe v2 REST API on configured port
gRPC endpoint TCP socket KServe v2 gRPC API on configured port
Metrics endpoint TCP socket Prometheus metrics on configured port
Loaded models server state All models from repository in READY state

Usage Examples

Basic Server Launch

# Launch with a local model repository
tritonserver --model-repository=/opt/triton/models

# Expected output:
# I0213 17:00:00.000000 1 server.cc:592] Started GRPCInferenceService at 0.0.0.0:8001
# I0213 17:00:00.000000 1 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
# I0213 17:00:00.000000 1 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002

Docker Launch with GPU Support

docker run --rm --gpus all \
    -p 8000:8000 -p 8001:8001 -p 8002:8002 \
    -v /path/to/models:/models \
    nvcr.io/nvidia/tritonserver:24.07-py3 \
    tritonserver --model-repository=/models \
        --model-control-mode=poll \
        --repository-poll-secs=30 \
        --log-verbose=1

Multiple Model Repositories

tritonserver \
    --model-repository=/models/production \
    --model-repository=/models/experimental \
    --model-control-mode=explicit \
    --load-model=model_a \
    --load-model=model_b

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment