Principle:Triton inference server Server Container Compose Build
| Field | Value |
|---|---|
| Page Type | Principle |
| Title | Container_Compose_Build |
| Namespace | Triton_inference_server_Server |
| Workflow | Custom_Container_Build |
| Domains | Container_Build, MLOps |
| Knowledge Sources | Triton Server, Triton Build Guide |
| Last Updated | 2026-02-13 17:00 GMT |
Overview
Method of constructing custom containers by selectively extracting pre-built backend binaries from NGC base images.
Description
Container compose builds a custom Docker image by selectively copying pre-compiled backend libraries from NVIDIA NGC base containers. This avoids full compilation while allowing backend selection. The process generates a Dockerfile that copies only requested backends from the full NGC image into a minimal base image.
The compose approach leverages the fact that NVIDIA publishes fully built Triton containers to the NGC registry with every release. These containers include all backends, endpoints, and filesystem integrations. Rather than rebuilding from source, the compose script:
- Pulls the full NGC Triton container (or a user-specified image) as a source of pre-built binaries
- Generates a Dockerfile that uses Docker multi-stage build syntax
- Selectively copies only the requested backend shared libraries, configuration files, and dependencies from the full image into a clean base image
- Builds the resulting Dockerfile into a new, smaller custom image
This approach is ideal when:
- No source code modifications are needed
- The desired backends are all available as pre-built binaries in the NGC image
- Build speed is a priority (minutes instead of hours)
- The target platform matches the NGC image platform (x86_64 Linux with CUDA)
Usage
The compose build path is the recommended approach for most custom container scenarios. It is used when operators need to reduce container size by removing unnecessary backends but do not need to modify server source code or add custom backends not available in the NGC image.
Common use cases:
- Production slimming: Remove unused backends from the full NGC image to reduce image size from ~15 GB to ~5 GB or less
- Security hardening: Remove backends that are not needed, reducing the attack surface
- Faster deployment: Smaller images pull faster from registries, reducing deployment time
- Quick iteration: Test different backend combinations without waiting for full source compilation
Theoretical Basis
The principle follows a binary extraction pattern:
- Full NGC image serves as the source of all pre-built backend binaries
- Selective COPY in a multi-stage Dockerfile extracts only requested components
- Minimal custom image contains only the server core and selected backends
The key tradeoff:
| Advantage | Limitation |
|---|---|
| Build completes in minutes (no compilation) | Cannot modify server source code |
| Produces identical binaries to NGC release | Limited to backends available in NGC image |
| Reproducible output for a given NGC version | Cannot add custom or third-party backends |
| No build toolchain required (only Docker) | Cannot change compile-time options or debug flags |
The compose approach implements the composition over compilation pattern: assembling a custom artifact from pre-built components rather than building everything from scratch.