Workflow:Triton inference server Server Custom Container Build

Knowledge Sources	Triton Inference Server Triton Build Guide Triton Compose Guide
Domains	ML_Ops, DevOps, Docker, Build_Systems
Last Updated	2026-02-13 17:00 GMT

Overview

End-to-end process for creating a customized Triton Inference Server Docker container with selected backends and features, using either the compose utility or building from source.

Description

This workflow covers two approaches to creating customized Triton server containers. The first (recommended) approach uses the compose.py utility to create a slimmed-down container that includes only the backends and repository agents needed for your deployment, significantly reducing image size. The second approach builds Triton entirely from source using build.py for maximum customization, including enabling or disabling specific protocol endpoints (HTTP, gRPC, SageMaker, Vertex AI), metrics, tracing, and other compile-time features. Both approaches produce Docker images ready for deployment.

Usage

Execute this workflow when you need a Triton container that differs from the default NGC image: either a smaller image with fewer backends, a container with custom compile-time features enabled or disabled, or when deploying on platforms where the default build is not available. Common triggers include production deployments requiring minimal image footprint, custom backend integration, or platform-specific builds (Windows, Jetson, unsupported Linux).

Execution Steps

Step 1: Determine customization requirements

Decide which backends, repository agents, and features your deployment needs. Review the available backends (TensorRT, PyTorch, ONNX, OpenVINO, Python, vLLM, TensorRT-LLM, etc.) and determine whether the compose utility is sufficient or if a full source build is required.

Key considerations:

Compose is sufficient if you only need to select a subset of backends from an existing release
Source build is needed for compile-time feature flags, custom backends, or unsupported platforms
Review the Backend-Platform Support Matrix for platform compatibility

Step 2: Clone the server repository

Clone the triton-inference-server/server repository at the desired release branch. The branch version determines which NGC base images the compose utility uses and which source version is built.

Key considerations:

Use a release branch (e.g., r26.01) for stable builds matching NGC containers
Use the main branch for development builds with latest features
The branch determines the compatible NGC container versions

Step 3A: Build with compose (lightweight approach)

Run compose.py with --backend and --repoagent flags to specify which components to include. The script extracts selected backends from the full NGC image and creates a minimal container based on the min NGC image. This approach does not require building anything from source.

Key considerations:

Requires Docker with access to pull NGC images
Each backend adds to the final image size
The resulting image is tagged locally as tritonserver:latest
Use --dry-run to preview the Dockerfile without building

Step 3B: Build from source (full customization)

Run build.py with the desired feature flags to generate CMake build scripts and Dockerfiles, then execute the build. This compiles the Triton core server, selected backends, and all dependencies from source. Build flags control protocol endpoints, GPU support, metrics, tracing, and other features.

Key considerations:

Requires substantial compute resources and build time
Use --dryrun to inspect generated build scripts before execution
Feature flags include --enable-gpu, --enable-http, --enable-grpc, --enable-metrics, --enable-tracing
Backend-specific flags select which backends to compile
Supports Docker-based build (recommended) or bare-metal build

Step 4: Verify the custom container

Launch the custom container and verify that the expected backends are loaded, endpoints are available, and a test model can be served. Check the server log for any missing dependencies or failed backend initializations.

Key considerations:

Start the server with a test model repository
Verify all expected backends appear in the server startup log
Test each configured endpoint (HTTP, gRPC) with a health check
Confirm that disabled features are not present in the container

Step 5: Tag and distribute the container

Tag the custom container image with an appropriate version identifier and push it to your container registry for deployment. Document the included backends and features for operational reference.

Key considerations:

Use a consistent naming and tagging scheme for your custom images
Record which backends and features are included in each image variant
Consider multi-stage builds for production to minimize layer count

Execution Diagram

GitHub URL

Workflow Repository