Implementation:Triton inference server Server Tritonserver CLI

Knowledge Sources	Triton Server Triton CLI Reference
Domains	MLOps, Model_Serving, CLI
Last Updated	2026-02-13 17:00 GMT

Overview

Concrete command-line interface for launching Triton Inference Server, parsing options, loading models, and starting HTTP/gRPC/metrics endpoints.

Description

The tritonserver binary is the main entry point for Triton Inference Server. It parses command-line arguments via TritonParser::Parse(), builds server options via BuildTritonServerOptions(), creates the server instance via TRITONSERVER_ServerNew(), and starts network endpoints via StartEndpoints(). The binary is typically run inside an NVIDIA NGC container.

Usage

Use this command to start Triton Inference Server in any deployment scenario. This is the standard way to launch the server whether for development, testing, or production. For programmatic embedding, use the TRITONSERVER C API directly instead.

Code Reference

Source Location

Repository: triton-inference-server/server
File: src/main.cc
Lines: L439-511 (main function), L224-300 (StartEndpoints)
File: src/command_line_parser.cc
Lines: L400-405 (--model-repository option), L1017-1036 (BuildTritonServerOptions)

Signature

tritonserver --model-repository=<path> \
    [--http-port=<int>]              \  # Default: 8000
    [--grpc-port=<int>]              \  # Default: 8001
    [--metrics-port=<int>]           \  # Default: 8002
    [--model-control-mode=<string>]  \  # none|poll|explicit (default: none)
    [--log-verbose=<int>]            \  # Verbosity level (default: 0)
    [--exit-on-error=<bool>]         \  # Exit on init error (default: true)
    [--exit-timeout-secs=<int>]      \  # Graceful shutdown timeout
    [--strict-readiness=<bool>]      \  # Strict health check (default: true)
    [--disable-auto-complete-config] \  # Require explicit config.pbtxt
    [--backend-config=<string>]         # Backend-specific config

// Internal C API call sequence (src/main.cc:L488-491)
TRITONSERVER_Server* server_ptr = nullptr;
TRITONSERVER_ServerNew(&server_ptr, triton_options.get());
// Followed by StartEndpoints() at L509

Import

# Binary available in NGC container:
docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 \
    nvcr.io/nvidia/tritonserver:<version>-py3 \
    tritonserver --model-repository=/models

I/O Contract

Inputs

Name	Type	Required	Description
--model-repository	string (path)	Yes	Path to model repository (local, gs://, s3://, as://). Repeatable.
--http-port	int	No	HTTP endpoint port (default: 8000)
--grpc-port	int	No	gRPC endpoint port (default: 8001)
--metrics-port	int	No	Prometheus metrics port (default: 8002)
--model-control-mode	string	No	Model loading strategy: none, poll, explicit (default: none)
--log-verbose	int	No	Logging verbosity level (default: 0)

Outputs

Name	Type	Description
HTTP endpoint	TCP socket	KServe v2 REST API on configured port
gRPC endpoint	TCP socket	KServe v2 gRPC API on configured port
Metrics endpoint	TCP socket	Prometheus metrics on configured port
Loaded models	server state	All models from repository in READY state

Usage Examples

Basic Server Launch

# Launch with a local model repository
tritonserver --model-repository=/opt/triton/models

# Expected output:
# I0213 17:00:00.000000 1 server.cc:592] Started GRPCInferenceService at 0.0.0.0:8001
# I0213 17:00:00.000000 1 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
# I0213 17:00:00.000000 1 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002

Docker Launch with GPU Support

docker run --rm --gpus all \
    -p 8000:8000 -p 8001:8001 -p 8002:8002 \
    -v /path/to/models:/models \
    nvcr.io/nvidia/tritonserver:24.07-py3 \
    tritonserver --model-repository=/models \
        --model-control-mode=poll \
        --repository-poll-secs=30 \
        --log-verbose=1

Multiple Model Repositories

tritonserver \
    --model-repository=/models/production \
    --model-repository=/models/experimental \
    --model-control-mode=explicit \
    --load-model=model_a \
    --load-model=model_b

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment