Workflow:Triton inference server Server Custom Container Build
| Knowledge Sources | |
|---|---|
| Domains | ML_Ops, DevOps, Docker, Build_Systems |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
End-to-end process for creating a customized Triton Inference Server Docker container with selected backends and features, using either the compose utility or building from source.
Description
This workflow covers two approaches to creating customized Triton server containers. The first (recommended) approach uses the compose.py utility to create a slimmed-down container that includes only the backends and repository agents needed for your deployment, significantly reducing image size. The second approach builds Triton entirely from source using build.py for maximum customization, including enabling or disabling specific protocol endpoints (HTTP, gRPC, SageMaker, Vertex AI), metrics, tracing, and other compile-time features. Both approaches produce Docker images ready for deployment.
Usage
Execute this workflow when you need a Triton container that differs from the default NGC image: either a smaller image with fewer backends, a container with custom compile-time features enabled or disabled, or when deploying on platforms where the default build is not available. Common triggers include production deployments requiring minimal image footprint, custom backend integration, or platform-specific builds (Windows, Jetson, unsupported Linux).
Execution Steps
Step 1: Determine customization requirements
Decide which backends, repository agents, and features your deployment needs. Review the available backends (TensorRT, PyTorch, ONNX, OpenVINO, Python, vLLM, TensorRT-LLM, etc.) and determine whether the compose utility is sufficient or if a full source build is required.
Key considerations:
- Compose is sufficient if you only need to select a subset of backends from an existing release
- Source build is needed for compile-time feature flags, custom backends, or unsupported platforms
- Review the Backend-Platform Support Matrix for platform compatibility
Step 2: Clone the server repository
Clone the triton-inference-server/server repository at the desired release branch. The branch version determines which NGC base images the compose utility uses and which source version is built.
Key considerations:
- Use a release branch (e.g., r26.01) for stable builds matching NGC containers
- Use the main branch for development builds with latest features
- The branch determines the compatible NGC container versions
Step 3A: Build with compose (lightweight approach)
Run compose.py with --backend and --repoagent flags to specify which components to include. The script extracts selected backends from the full NGC image and creates a minimal container based on the min NGC image. This approach does not require building anything from source.
Key considerations:
- Requires Docker with access to pull NGC images
- Each backend adds to the final image size
- The resulting image is tagged locally as tritonserver:latest
- Use --dry-run to preview the Dockerfile without building
Step 3B: Build from source (full customization)
Run build.py with the desired feature flags to generate CMake build scripts and Dockerfiles, then execute the build. This compiles the Triton core server, selected backends, and all dependencies from source. Build flags control protocol endpoints, GPU support, metrics, tracing, and other features.
Key considerations:
- Requires substantial compute resources and build time
- Use --dryrun to inspect generated build scripts before execution
- Feature flags include --enable-gpu, --enable-http, --enable-grpc, --enable-metrics, --enable-tracing
- Backend-specific flags select which backends to compile
- Supports Docker-based build (recommended) or bare-metal build
Step 4: Verify the custom container
Launch the custom container and verify that the expected backends are loaded, endpoints are available, and a test model can be served. Check the server log for any missing dependencies or failed backend initializations.
Key considerations:
- Start the server with a test model repository
- Verify all expected backends appear in the server startup log
- Test each configured endpoint (HTTP, gRPC) with a health check
- Confirm that disabled features are not present in the container
Step 5: Tag and distribute the container
Tag the custom container image with an appropriate version identifier and push it to your container registry for deployment. Document the included backends and features for operational reference.
Key considerations:
- Use a consistent naming and tagging scheme for your custom images
- Record which backends and features are included in each image variant
- Consider multi-stage builds for production to minimize layer count