Implementation:Triton inference server Server Tritonserver CLI
| Knowledge Sources | |
|---|---|
| Domains | MLOps, Model_Serving, CLI |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Concrete command-line interface for launching Triton Inference Server, parsing options, loading models, and starting HTTP/gRPC/metrics endpoints.
Description
The tritonserver binary is the main entry point for Triton Inference Server. It parses command-line arguments via TritonParser::Parse(), builds server options via BuildTritonServerOptions(), creates the server instance via TRITONSERVER_ServerNew(), and starts network endpoints via StartEndpoints(). The binary is typically run inside an NVIDIA NGC container.
Usage
Use this command to start Triton Inference Server in any deployment scenario. This is the standard way to launch the server whether for development, testing, or production. For programmatic embedding, use the TRITONSERVER C API directly instead.
Code Reference
Source Location
- Repository: triton-inference-server/server
- File: src/main.cc
- Lines: L439-511 (main function), L224-300 (StartEndpoints)
- File: src/command_line_parser.cc
- Lines: L400-405 (--model-repository option), L1017-1036 (BuildTritonServerOptions)
Signature
tritonserver --model-repository=<path> \
[--http-port=<int>] \ # Default: 8000
[--grpc-port=<int>] \ # Default: 8001
[--metrics-port=<int>] \ # Default: 8002
[--model-control-mode=<string>] \ # none|poll|explicit (default: none)
[--log-verbose=<int>] \ # Verbosity level (default: 0)
[--exit-on-error=<bool>] \ # Exit on init error (default: true)
[--exit-timeout-secs=<int>] \ # Graceful shutdown timeout
[--strict-readiness=<bool>] \ # Strict health check (default: true)
[--disable-auto-complete-config] \ # Require explicit config.pbtxt
[--backend-config=<string>] # Backend-specific config
// Internal C API call sequence (src/main.cc:L488-491)
TRITONSERVER_Server* server_ptr = nullptr;
TRITONSERVER_ServerNew(&server_ptr, triton_options.get());
// Followed by StartEndpoints() at L509
Import
# Binary available in NGC container:
docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 \
nvcr.io/nvidia/tritonserver:<version>-py3 \
tritonserver --model-repository=/models
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --model-repository | string (path) | Yes | Path to model repository (local, gs://, s3://, as://). Repeatable. |
| --http-port | int | No | HTTP endpoint port (default: 8000) |
| --grpc-port | int | No | gRPC endpoint port (default: 8001) |
| --metrics-port | int | No | Prometheus metrics port (default: 8002) |
| --model-control-mode | string | No | Model loading strategy: none, poll, explicit (default: none) |
| --log-verbose | int | No | Logging verbosity level (default: 0) |
Outputs
| Name | Type | Description |
|---|---|---|
| HTTP endpoint | TCP socket | KServe v2 REST API on configured port |
| gRPC endpoint | TCP socket | KServe v2 gRPC API on configured port |
| Metrics endpoint | TCP socket | Prometheus metrics on configured port |
| Loaded models | server state | All models from repository in READY state |
Usage Examples
Basic Server Launch
# Launch with a local model repository
tritonserver --model-repository=/opt/triton/models
# Expected output:
# I0213 17:00:00.000000 1 server.cc:592] Started GRPCInferenceService at 0.0.0.0:8001
# I0213 17:00:00.000000 1 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
# I0213 17:00:00.000000 1 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002
Docker Launch with GPU Support
docker run --rm --gpus all \
-p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v /path/to/models:/models \
nvcr.io/nvidia/tritonserver:24.07-py3 \
tritonserver --model-repository=/models \
--model-control-mode=poll \
--repository-poll-secs=30 \
--log-verbose=1
Multiple Model Repositories
tritonserver \
--model-repository=/models/production \
--model-repository=/models/experimental \
--model-control-mode=explicit \
--load-model=model_a \
--load-model=model_b