Environment:InternLM Lmdeploy Build From Source
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Build_System |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Build environment with CMake, Ninja, CUDA Toolkit, pybind11, and CUTLASS v3.9.2 for compiling TurboMind C++/CUDA inference engine from source.
Description
Building LMDeploy from source is required when pre-built wheels are unavailable for a specific CUDA version or platform, or when developing custom kernels. The build system uses CMake with Ninja generator to compile the TurboMind C++ backend, which includes custom CUDA kernels for attention, GEMM, sampling, and quantization. The CUTLASS library (v3.9.2) is fetched automatically during build. Multi-GPU support via NCCL is enabled by default on Linux.
Usage
Use this environment when:
- Pre-built wheels are not available for your CUDA version.
- You need to modify TurboMind C++ kernels or add new model support.
- You are building Docker images for deployment.
- You need to target specific CUDA architectures not included in the default build.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu 20.04+) | Windows build supported but without multi-GPU or NVTX |
| CUDA Toolkit | >= 11.0 | Determines which GPU architectures can be targeted |
| CMake | >= 3.18 | Required by cmake_build_extension |
| Ninja | >= 1.10 | Default generator on Linux (Makefile on Windows) |
| C++ Compiler | GCC >= 9 or compatible | Must support C++17 |
| Disk | 30GB+ SSD | Source + build artifacts + CUTLASS fetch |
| RAM | 16GB+ | Parallel CUDA compilation is memory-intensive |
Dependencies
Build Dependencies
- `cmake` >= 3.18
- `pybind11` (Python C++ bindings)
- `cmake_build_extension` (setuptools CMake integration)
- `ninja` (fast build system)
NVIDIA Libraries (auto-resolved)
- `nvidia-nccl-cu{VERSION}` (multi-GPU communication)
- `nvidia-cuda-runtime-cu{VERSION}` (CUDA runtime)
- `nvidia-cublas-cu{VERSION}` (linear algebra)
- `nvidia-curand-cu{VERSION}` (random numbers)
Fetched During Build
- CUTLASS v3.9.2 (NVIDIA GPU kernel templates)
- Catch2 v3.6.0 (C++ testing framework)
- xgrammar (grammar-guided decoding)
Credentials
No credentials required for building. GitHub access needed if behind a firewall (CUTLASS is fetched from GitHub).
Quick Install
# Clone repository
git clone https://github.com/InternLM/lmdeploy.git
cd lmdeploy
# Install build dependencies
pip install -r requirements/build.txt
# Build and install
pip install -e .
# Or specify CUDA compiler explicitly
CUDACXX=/usr/local/cuda/bin/nvcc pip install -e .
# Disable TurboMind build (Python-only)
DISABLE_TURBOMIND=1 pip install -e .
# Target specific device
LMDEPLOY_TARGET_DEVICE=cuda pip install -e .
Code Evidence
Build environment variables from `setup.py:13-14,39-41`:
def get_target_device():
return os.getenv('LMDEPLOY_TARGET_DEVICE', 'cuda')
CUDA_COMPILER = os.getenv('CUDACXX',
os.getenv('CMAKE_CUDA_COMPILER', 'nvcc'))
nvcc_output = subprocess.check_output(
[CUDA_COMPILER, '--version'],
stderr=subprocess.DEVNULL).decode()
CUDAVER, = re.search(r'release\s+(\d+).', nvcc_output).groups()
TurboMind disable check from `setup.py:133`:
if get_target_device() == 'cuda' and not os.getenv(
'DISABLE_TURBOMIND', '').lower() in (
'yes', 'true', 'on', 't', '1'):
# Build TurboMind C++ extension
ext_modules = [cmake_build_extension.CMakeExtension(...)]
else:
ext_modules = []
CUDA version-dependent NCCL packaging from `setup.py:42-55`:
if int(CUDAVER) >= 13:
return [
f'nvidia-nccl-cu{CUDAVER}',
'nvidia-cuda-runtime',
'nvidia-cublas',
'nvidia-curand',
]
else:
return [
f'nvidia-nccl-cu{CUDAVER}',
f'nvidia-cuda-runtime-cu{CUDAVER}',
f'nvidia-cublas-cu{CUDAVER}',
f'nvidia-curand-cu{CUDAVER}',
]
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `nvcc not found` | CUDA Toolkit not installed or not in PATH | Install CUDA Toolkit; set `CUDACXX` environment variable |
| `CMake Error: Could not find CUDAToolkit` | CMake cannot locate CUDA | Set `CMAKE_CUDA_COMPILER` to full path of `nvcc` |
| `ninja: build stopped: subcommand failed` | CUDA kernel compilation error | Check CUDA version compatibility; ensure sufficient RAM for parallel builds |
| CUTLASS fetch failure | Network issue fetching from GitHub | Set `FETCHCONTENT_FULLY_DISCONNECTED=ON` and pre-download CUTLASS |
Compatibility Notes
- CUDA 13+: Uses generic NVIDIA package names (e.g., `nvidia-cuda-runtime`) instead of version-suffixed names.
- Windows (MSVC): SM80 and SM90a architectures are excluded. Multi-GPU (`BUILD_MULTI_GPU`) and NVTX (`USE_NVTX`) are disabled.
- aarch64/ARM: Targets SM72 and SM87 (Jetson) architectures only.
- CUTLASS: The build requires `CUTLASS_ENABLE_SM90_EXTENDED_MMA_SHAPES=ON` for Hopper GPU support.
- Debug builds: Use `CMAKE_BUILD_TYPE=Debug` via the `debug.sh` script for GDB debugging of TurboMind.