Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server Pip Install Tensorrt LLM

From Leeroopedia

Metadata

Field Value
Type Implementation
Workflow LLM_Deployment_With_TRT_LLM
Repo Triton_inference_server_Server
Source docs/getting_started/llm.md:L52-63
Domains NLP, LLM_Deployment, Environment_Setup
Knowledge_Sources TRT-LLM Docs|https://nvidia.github.io/TensorRT-LLM/, source::Repo|Triton Server|https://github.com/triton-inference-server/server
implements Principle:Triton_inference_server_Server_TRT_LLM_Environment_Setup
2026-02-13 17:00 GMT

Overview

Concrete pip installation procedure for TensorRT-LLM package and system dependencies. This implementation covers the exact commands needed to set up a working TRT-LLM environment from a clean system.

Description

This implementation provides the step-by-step procedure for installing TensorRT-LLM and all required system-level dependencies. The process involves two phases:

  1. System dependency installation — Installing OS-level packages via apt-get including Python 3.10, openmpi for multi-GPU support, and git-lfs for large model file handling
  2. Python package installation — Installing the tensorrt_llm package from NVIDIA's custom PyPI index with explicit version pinning to ensure reproducibility

The --extra-index-url flag directs pip to also search NVIDIA's PyPI mirror for packages not available on the default PyPI, including pre-built CUDA-enabled wheels for TensorRT-LLM and its dependencies.

Usage

Run these commands on a system with NVIDIA GPU drivers and CUDA 12.4+ already installed. Typically executed inside an NVIDIA NGC container or a bare-metal system with compatible GPU drivers.

Code Reference

Source Location

Item Value
File docs/getting_started/llm.md
Lines L52-63
Repo https://github.com/triton-inference-server/server

Signature

# System dependencies
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev git git-lfs

# TensorRT-LLM package installation
pip3 install tensorrt_llm==0.11.0 --extra-index-url https://pypi.nvidia.com

Import / Verification

python3 -c "import tensorrt_llm"

I/O Contract

Inputs

Name Type Description
Clean Python 3.10 environment System A system with Python 3.10 available or installable
CUDA 12.4+ System NVIDIA CUDA toolkit version 12.4 or later installed
NVIDIA GPU drivers System Compatible NVIDIA GPU drivers installed and functional
Network access System Internet connectivity to access pypi.org and pypi.nvidia.com

Outputs

Name Type Description
tensorrt_llm package Python package Installed TensorRT-LLM Python module, verifiable via python3 -c "import tensorrt_llm"
openmpi-bin System package MPI runtime for multi-GPU coordination
git-lfs System package Git Large File Storage extension for downloading model weights
Model conversion scripts Python scripts Checkpoint conversion utilities included with TRT-LLM

Usage Examples

Full installation on a clean system

# Step 1: Install system dependencies
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev git git-lfs

# Step 2: Install TensorRT-LLM with version pinning
pip3 install tensorrt_llm==0.11.0 --extra-index-url https://pypi.nvidia.com

# Step 3: Verify installation
python3 -c "import tensorrt_llm; print(tensorrt_llm.__version__)"

Key Parameters

Parameter Description Example Value
tensorrt_llm==0.11.0 Version pin for reproducibility 0.11.0
--extra-index-url NVIDIA's custom PyPI index URL https://pypi.nvidia.com

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment