Implementation:Triton inference server Server Pip Install Tensorrt LLM

Metadata

Field	Value
Type	Implementation
Workflow	LLM_Deployment_With_TRT_LLM
Repo	Triton_inference_server_Server
Source	docs/getting_started/llm.md:L52-63
Domains	NLP, LLM_Deployment, Environment_Setup
Knowledge_Sources	TRT-LLM Docs\|https://nvidia.github.io/TensorRT-LLM/, source::Repo\|Triton Server\|https://github.com/triton-inference-server/server
implements	Principle:Triton_inference_server_Server_TRT_LLM_Environment_Setup
2026-02-13 17:00 GMT

Overview

Concrete pip installation procedure for TensorRT-LLM package and system dependencies. This implementation covers the exact commands needed to set up a working TRT-LLM environment from a clean system.

Description

This implementation provides the step-by-step procedure for installing TensorRT-LLM and all required system-level dependencies. The process involves two phases:

System dependency installation — Installing OS-level packages via apt-get including Python 3.10, openmpi for multi-GPU support, and git-lfs for large model file handling
Python package installation — Installing the tensorrt_llm package from NVIDIA's custom PyPI index with explicit version pinning to ensure reproducibility

The --extra-index-url flag directs pip to also search NVIDIA's PyPI mirror for packages not available on the default PyPI, including pre-built CUDA-enabled wheels for TensorRT-LLM and its dependencies.

Usage

Run these commands on a system with NVIDIA GPU drivers and CUDA 12.4+ already installed. Typically executed inside an NVIDIA NGC container or a bare-metal system with compatible GPU drivers.

Code Reference

Source Location

Item	Value
File	docs/getting_started/llm.md
Lines	L52-63
Repo	https://github.com/triton-inference-server/server

Signature

# System dependencies
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev git git-lfs

# TensorRT-LLM package installation
pip3 install tensorrt_llm==0.11.0 --extra-index-url https://pypi.nvidia.com

Import / Verification

python3 -c "import tensorrt_llm"

I/O Contract

Inputs

Name	Type	Description
Clean Python 3.10 environment	System	A system with Python 3.10 available or installable
CUDA 12.4+	System	NVIDIA CUDA toolkit version 12.4 or later installed
NVIDIA GPU drivers	System	Compatible NVIDIA GPU drivers installed and functional
Network access	System	Internet connectivity to access pypi.org and pypi.nvidia.com

Outputs

Name	Type	Description
tensorrt_llm package	Python package	Installed TensorRT-LLM Python module, verifiable via `python3 -c "import tensorrt_llm"`
openmpi-bin	System package	MPI runtime for multi-GPU coordination
git-lfs	System package	Git Large File Storage extension for downloading model weights
Model conversion scripts	Python scripts	Checkpoint conversion utilities included with TRT-LLM

Usage Examples

Full installation on a clean system

# Step 1: Install system dependencies
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev git git-lfs

# Step 2: Install TensorRT-LLM with version pinning
pip3 install tensorrt_llm==0.11.0 --extra-index-url https://pypi.nvidia.com

# Step 3: Verify installation
python3 -c "import tensorrt_llm; print(tensorrt_llm.__version__)"

Key Parameters

Parameter	Description	Example Value
`tensorrt_llm==0.11.0`	Version pin for reproducibility	0.11.0
`--extra-index-url`	NVIDIA's custom PyPI index URL	https://pypi.nvidia.com

Related Pages

Principle:Triton_inference_server_Server_TRT_LLM_Environment_Setup
Implementation:Triton_inference_server_Server_Git_LFS_Clone — Next step: downloading model weights
Implementation:Triton_inference_server_Server_Convert_Checkpoint — Requires this environment
Implementation:Triton_inference_server_Server_Trtllm_Build — Requires this environment
Environment:Triton_inference_server_Server_TRT_LLM_Deployment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment