Environment:InternLM Lmdeploy Build From Source

Knowledge Sources	LMDeploy LMDeploy Installation
Domains	Infrastructure, Build_System
Last Updated	2026-02-07 15:00 GMT

Overview

Build environment with CMake, Ninja, CUDA Toolkit, pybind11, and CUTLASS v3.9.2 for compiling TurboMind C++/CUDA inference engine from source.

Description

Building LMDeploy from source is required when pre-built wheels are unavailable for a specific CUDA version or platform, or when developing custom kernels. The build system uses CMake with Ninja generator to compile the TurboMind C++ backend, which includes custom CUDA kernels for attention, GEMM, sampling, and quantization. The CUTLASS library (v3.9.2) is fetched automatically during build. Multi-GPU support via NCCL is enabled by default on Linux.

Usage

Use this environment when:

Pre-built wheels are not available for your CUDA version.
You need to modify TurboMind C++ kernels or add new model support.
You are building Docker images for deployment.
You need to target specific CUDA architectures not included in the default build.

System Requirements

Category	Requirement	Notes
OS	Linux (Ubuntu 20.04+)	Windows build supported but without multi-GPU or NVTX
CUDA Toolkit	>= 11.0	Determines which GPU architectures can be targeted
CMake	>= 3.18	Required by cmake_build_extension
Ninja	>= 1.10	Default generator on Linux (Makefile on Windows)
C++ Compiler	GCC >= 9 or compatible	Must support C++17
Disk	30GB+ SSD	Source + build artifacts + CUTLASS fetch
RAM	16GB+	Parallel CUDA compilation is memory-intensive

Dependencies

Build Dependencies

`cmake` >= 3.18
`pybind11` (Python C++ bindings)
`cmake_build_extension` (setuptools CMake integration)
`ninja` (fast build system)

NVIDIA Libraries (auto-resolved)

`nvidia-nccl-cu{VERSION}` (multi-GPU communication)
`nvidia-cuda-runtime-cu{VERSION}` (CUDA runtime)
`nvidia-cublas-cu{VERSION}` (linear algebra)
`nvidia-curand-cu{VERSION}` (random numbers)

Fetched During Build

CUTLASS v3.9.2 (NVIDIA GPU kernel templates)
Catch2 v3.6.0 (C++ testing framework)
xgrammar (grammar-guided decoding)

Credentials

No credentials required for building. GitHub access needed if behind a firewall (CUTLASS is fetched from GitHub).

Quick Install

# Clone repository
git clone https://github.com/InternLM/lmdeploy.git
cd lmdeploy

# Install build dependencies
pip install -r requirements/build.txt

# Build and install
pip install -e .

# Or specify CUDA compiler explicitly
CUDACXX=/usr/local/cuda/bin/nvcc pip install -e .

# Disable TurboMind build (Python-only)
DISABLE_TURBOMIND=1 pip install -e .

# Target specific device
LMDEPLOY_TARGET_DEVICE=cuda pip install -e .

Code Evidence

Build environment variables from `setup.py:13-14,39-41`:

def get_target_device():
    return os.getenv('LMDEPLOY_TARGET_DEVICE', 'cuda')

CUDA_COMPILER = os.getenv('CUDACXX',
    os.getenv('CMAKE_CUDA_COMPILER', 'nvcc'))
nvcc_output = subprocess.check_output(
    [CUDA_COMPILER, '--version'],
    stderr=subprocess.DEVNULL).decode()
CUDAVER, = re.search(r'release\s+(\d+).', nvcc_output).groups()

TurboMind disable check from `setup.py:133`:

if get_target_device() == 'cuda' and not os.getenv(
        'DISABLE_TURBOMIND', '').lower() in (
        'yes', 'true', 'on', 't', '1'):
    # Build TurboMind C++ extension
    ext_modules = [cmake_build_extension.CMakeExtension(...)]
else:
    ext_modules = []

CUDA version-dependent NCCL packaging from `setup.py:42-55`:

if int(CUDAVER) >= 13:
    return [
        f'nvidia-nccl-cu{CUDAVER}',
        'nvidia-cuda-runtime',
        'nvidia-cublas',
        'nvidia-curand',
    ]
else:
    return [
        f'nvidia-nccl-cu{CUDAVER}',
        f'nvidia-cuda-runtime-cu{CUDAVER}',
        f'nvidia-cublas-cu{CUDAVER}',
        f'nvidia-curand-cu{CUDAVER}',
    ]

Common Errors

Error Message	Cause	Solution
`nvcc not found`	CUDA Toolkit not installed or not in PATH	Install CUDA Toolkit; set `CUDACXX` environment variable
`CMake Error: Could not find CUDAToolkit`	CMake cannot locate CUDA	Set `CMAKE_CUDA_COMPILER` to full path of `nvcc`
`ninja: build stopped: subcommand failed`	CUDA kernel compilation error	Check CUDA version compatibility; ensure sufficient RAM for parallel builds
CUTLASS fetch failure	Network issue fetching from GitHub	Set `FETCHCONTENT_FULLY_DISCONNECTED=ON` and pre-download CUTLASS

Compatibility Notes

CUDA 13+: Uses generic NVIDIA package names (e.g., `nvidia-cuda-runtime`) instead of version-suffixed names.
Windows (MSVC): SM80 and SM90a architectures are excluded. Multi-GPU (`BUILD_MULTI_GPU`) and NVTX (`USE_NVTX`) are disabled.
aarch64/ARM: Targets SM72 and SM87 (Jetson) architectures only.
CUTLASS: The build requires `CUTLASS_ENABLE_SM90_EXTENDED_MMA_SHAPES=ON` for Hopper GPU support.
Debug builds: Use `CMAKE_BUILD_TYPE=Debug` via the `debug.sh` script for GDB debugging of TurboMind.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment