Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:InternLM Lmdeploy Build From Source

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Build_System
Last Updated 2026-02-07 15:00 GMT

Overview

Build environment with CMake, Ninja, CUDA Toolkit, pybind11, and CUTLASS v3.9.2 for compiling TurboMind C++/CUDA inference engine from source.

Description

Building LMDeploy from source is required when pre-built wheels are unavailable for a specific CUDA version or platform, or when developing custom kernels. The build system uses CMake with Ninja generator to compile the TurboMind C++ backend, which includes custom CUDA kernels for attention, GEMM, sampling, and quantization. The CUTLASS library (v3.9.2) is fetched automatically during build. Multi-GPU support via NCCL is enabled by default on Linux.

Usage

Use this environment when:

  • Pre-built wheels are not available for your CUDA version.
  • You need to modify TurboMind C++ kernels or add new model support.
  • You are building Docker images for deployment.
  • You need to target specific CUDA architectures not included in the default build.

System Requirements

Category Requirement Notes
OS Linux (Ubuntu 20.04+) Windows build supported but without multi-GPU or NVTX
CUDA Toolkit >= 11.0 Determines which GPU architectures can be targeted
CMake >= 3.18 Required by cmake_build_extension
Ninja >= 1.10 Default generator on Linux (Makefile on Windows)
C++ Compiler GCC >= 9 or compatible Must support C++17
Disk 30GB+ SSD Source + build artifacts + CUTLASS fetch
RAM 16GB+ Parallel CUDA compilation is memory-intensive

Dependencies

Build Dependencies

  • `cmake` >= 3.18
  • `pybind11` (Python C++ bindings)
  • `cmake_build_extension` (setuptools CMake integration)
  • `ninja` (fast build system)

NVIDIA Libraries (auto-resolved)

  • `nvidia-nccl-cu{VERSION}` (multi-GPU communication)
  • `nvidia-cuda-runtime-cu{VERSION}` (CUDA runtime)
  • `nvidia-cublas-cu{VERSION}` (linear algebra)
  • `nvidia-curand-cu{VERSION}` (random numbers)

Fetched During Build

  • CUTLASS v3.9.2 (NVIDIA GPU kernel templates)
  • Catch2 v3.6.0 (C++ testing framework)
  • xgrammar (grammar-guided decoding)

Credentials

No credentials required for building. GitHub access needed if behind a firewall (CUTLASS is fetched from GitHub).

Quick Install

# Clone repository
git clone https://github.com/InternLM/lmdeploy.git
cd lmdeploy

# Install build dependencies
pip install -r requirements/build.txt

# Build and install
pip install -e .

# Or specify CUDA compiler explicitly
CUDACXX=/usr/local/cuda/bin/nvcc pip install -e .

# Disable TurboMind build (Python-only)
DISABLE_TURBOMIND=1 pip install -e .

# Target specific device
LMDEPLOY_TARGET_DEVICE=cuda pip install -e .

Code Evidence

Build environment variables from `setup.py:13-14,39-41`:

def get_target_device():
    return os.getenv('LMDEPLOY_TARGET_DEVICE', 'cuda')

CUDA_COMPILER = os.getenv('CUDACXX',
    os.getenv('CMAKE_CUDA_COMPILER', 'nvcc'))
nvcc_output = subprocess.check_output(
    [CUDA_COMPILER, '--version'],
    stderr=subprocess.DEVNULL).decode()
CUDAVER, = re.search(r'release\s+(\d+).', nvcc_output).groups()

TurboMind disable check from `setup.py:133`:

if get_target_device() == 'cuda' and not os.getenv(
        'DISABLE_TURBOMIND', '').lower() in (
        'yes', 'true', 'on', 't', '1'):
    # Build TurboMind C++ extension
    ext_modules = [cmake_build_extension.CMakeExtension(...)]
else:
    ext_modules = []

CUDA version-dependent NCCL packaging from `setup.py:42-55`:

if int(CUDAVER) >= 13:
    return [
        f'nvidia-nccl-cu{CUDAVER}',
        'nvidia-cuda-runtime',
        'nvidia-cublas',
        'nvidia-curand',
    ]
else:
    return [
        f'nvidia-nccl-cu{CUDAVER}',
        f'nvidia-cuda-runtime-cu{CUDAVER}',
        f'nvidia-cublas-cu{CUDAVER}',
        f'nvidia-curand-cu{CUDAVER}',
    ]

Common Errors

Error Message Cause Solution
`nvcc not found` CUDA Toolkit not installed or not in PATH Install CUDA Toolkit; set `CUDACXX` environment variable
`CMake Error: Could not find CUDAToolkit` CMake cannot locate CUDA Set `CMAKE_CUDA_COMPILER` to full path of `nvcc`
`ninja: build stopped: subcommand failed` CUDA kernel compilation error Check CUDA version compatibility; ensure sufficient RAM for parallel builds
CUTLASS fetch failure Network issue fetching from GitHub Set `FETCHCONTENT_FULLY_DISCONNECTED=ON` and pre-download CUTLASS

Compatibility Notes

  • CUDA 13+: Uses generic NVIDIA package names (e.g., `nvidia-cuda-runtime`) instead of version-suffixed names.
  • Windows (MSVC): SM80 and SM90a architectures are excluded. Multi-GPU (`BUILD_MULTI_GPU`) and NVTX (`USE_NVTX`) are disabled.
  • aarch64/ARM: Targets SM72 and SM87 (Jetson) architectures only.
  • CUTLASS: The build requires `CUTLASS_ENABLE_SM90_EXTENDED_MMA_SHAPES=ON` for Hopper GPU support.
  • Debug builds: Use `CMAKE_BUILD_TYPE=Debug` via the `debug.sh` script for GDB debugging of TurboMind.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment