Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Hpcaitech ColossalAI CUDA GPU Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Deep_Learning
Last Updated 2026-02-09 03:00 GMT

Overview

ColossalAI requires an NVIDIA CUDA-capable GPU environment with a matching CUDA toolkit and PyTorch installation to build and run its distributed training framework.

Description

ColossalAI is a distributed deep learning system that relies heavily on NVIDIA CUDA for GPU-accelerated computation. The environment enforces strict compatibility between the system CUDA toolkit version and the CUDA version that PyTorch was compiled against. At build time, ColossalAI compiles custom CUDA kernel extensions (controlled by the BUILD_EXT environment variable) that target specific GPU compute capabilities ranging from Pascal (6.x) through Ampere (8.x). At runtime, the framework initializes with the NCCL communication backend and sets CUDA_DEVICE_MAX_CONNECTIONS to 1 for deterministic inter-device communication. Windows is not supported; the project raises a RuntimeError directing users to WSL if a Windows platform is detected.

Usage

Use this environment whenever deploying or developing against ColossalAI for distributed training, reinforcement learning from human feedback (RLHF), or large-model inference. This environment must be satisfied before any ColossalAI Implementation page (such as Booster, SFTTrainer, or DPOTrainer) can function correctly.

System Requirements

Requirement Value Notes
Operating System Linux Windows is explicitly unsupported; a RuntimeError is raised suggesting WSL (see setup.py:19-20)
Python >= 3.6 Declared via python_requires in setup.py:136
NVIDIA GPU Compute Capability >= 6.0 Pascal (6.x), Volta (7.0), Turing (7.5), Ampere (8.0, 8.6) are supported
CUDA Toolkit Must match PyTorch CUDA version Major version must match exactly; minor version mismatch produces a warning (see extensions/utils.py:84-101)
Communication Backend NCCL Default backend set in colossalai/initialize.py:25 and colossalai/accelerator/cuda_accelerator.py:19

Dependencies

System Packages

  • NVIDIA Driver compatible with the installed CUDA toolkit version
  • CUDA Toolkit with nvcc compiler (path exposed via CUDA_HOME environment variable)
  • NCCL library for multi-GPU and multi-node communication

Python Packages

Package Version Constraint Source
torch >= 2.2.0, <= 2.5.1 requirements/requirements.txt
transformers == 4.51.3 requirements/requirements.txt
peft >= 0.7.1, <= 0.13.2 requirements/requirements.txt

Note: The CUDA extension build code in extensions/cuda_extension.py:13-15 defines a legacy minimum of PyTorch 1.10, but the project-level requirements/requirements.txt enforces torch >= 2.2.0 as the effective minimum.

Credentials

ColossalAI uses several environment variables to control CUDA behavior. No API keys or secret credentials are required, but these variables must be set correctly.

Variable Required Description
CUDA_HOME Yes (for extension builds) Path to the CUDA toolkit installation (e.g., /usr/local/cuda). Checked in extensions/cuda_extension.py:38-46 during CUDA extension compilation.
BUILD_EXT No Set to 1 to build CUDA extensions ahead-of-time during pip install. When set, torch must be importable at install time (see setup.py:16, 70-93).
FORCE_CUDA No When set, forces CUDA extension support even if torch.cuda.is_available() returns False. Useful for cross-compilation (see extensions/cuda_extension.py:26-36).
TORCH_CUDA_ARCH_LIST No Semicolon-separated list of CUDA compute capabilities to target (e.g., 7.0;7.5;8.0). If unset, ColossalAI auto-detects from the current GPU (see extensions/utils.py:154-190).
CUDA_DEVICE_MAX_CONNECTIONS Auto-set Automatically set to 1 by colossalai/initialize.py:10 at framework startup.

Quick Install

# Verify NVIDIA GPU and driver
nvidia-smi

# Set CUDA_HOME if not already configured
export CUDA_HOME=/usr/local/cuda

# Install ColossalAI with pre-built CUDA extensions
BUILD_EXT=1 pip install colossalai

# Or install without ahead-of-time extension compilation (extensions build on first use)
pip install colossalai

# Install pinned dependencies
pip install "torch>=2.2.0,<=2.5.1" "transformers==4.51.3" "peft>=0.7.1,<=0.13.2"

Code Evidence

Windows platform guard (setup.py:19-20):

if platform.system() == "Windows":
    raise RuntimeError("Windows is not supported. Please use WSL.")

CUDA availability and FORCE_CUDA check (extensions/cuda_extension.py:26-36):

def is_available(self) -> bool:
    # Check if CUDA is available
    try:
        import torch
        cuda_available = torch.cuda.is_available()
    except ImportError:
        cuda_available = False

    if not cuda_available and os.environ.get("FORCE_CUDA"):
        cuda_available = True
    return cuda_available

System CUDA and PyTorch CUDA version match (extensions/utils.py:84-101):

def check_system_pytorch_cuda_match(cuda_dir):
    system_cuda_version = get_cuda_version_from_exec(cuda_dir)
    torch_cuda_version = torch.version.cuda
    # major version must match
    if system_cuda_version.major != torch_cuda_version.major:
        raise RuntimeError(
            f"System CUDA {system_cuda_version} != PyTorch CUDA {torch_cuda_version}"
        )
    # minor version mismatch is a warning
    if system_cuda_version.minor != torch_cuda_version.minor:
        warnings.warn(...)

CUDA architecture list setup (extensions/utils.py:154-190):

def set_cuda_arch_list(cuda_dir):
    # Supports Pascal (6.x), Volta (7.0), Turing (7.5), Ampere (8.0, 8.6)
    # Requires compute capability >= 6.0
    ...

Common Errors

Error Cause Solution
RuntimeError: Windows is not supported. Please use WSL. Running setup.py on a native Windows platform Use Windows Subsystem for Linux (WSL) with an Ubuntu distribution and NVIDIA CUDA drivers for WSL
RuntimeError: System CUDA version does not match PyTorch CUDA version The CUDA toolkit installed on the system has a different major version than the CUDA version PyTorch was compiled with Install a CUDA toolkit whose major version matches PyTorch's CUDA version (check with python -c "import torch; print(torch.version.cuda)")
RuntimeError: PyTorch version is too old Installed PyTorch version is below the minimum required (2.2.0 per requirements, or 1.10 per legacy extension code) Upgrade PyTorch: pip install "torch>=2.2.0,<=2.5.1"
CUDA_HOME environment variable is not set CUDA_HOME is not defined and the build system cannot locate nvcc Set export CUDA_HOME=/usr/local/cuda (or the correct path to your CUDA installation)
No CUDA runtime is found No NVIDIA GPU detected and FORCE_CUDA is not set Ensure an NVIDIA GPU is available and drivers are installed, or set export FORCE_CUDA=1 for cross-compilation
CUDA extension build fails with unsupported architecture Target GPU has compute capability below 6.0 ColossalAI requires Pascal-generation (compute capability 6.0) or newer GPUs. Upgrade hardware or set TORCH_CUDA_ARCH_LIST to a supported value.

Compatibility Notes

  • Python version: The python_requires field specifies >= 3.6, but practical compatibility depends on the PyTorch version installed. PyTorch >= 2.2.0 itself requires Python >= 3.8.
  • CUDA architecture: Only NVIDIA GPUs with compute capability >= 6.0 are supported. This includes Pascal (GTX 10xx, P100), Volta (V100), Turing (RTX 20xx, T4), and Ampere (RTX 30xx, A100) generation cards.
  • NCCL backend: The default communication backend is NCCL, which requires all participating nodes to have NVIDIA GPUs. Alternative backends are not configured by default.
  • CUDA version matching: The system CUDA toolkit major version must match the PyTorch CUDA major version exactly. A minor version mismatch produces a warning but does not prevent operation.
  • Ahead-of-time vs just-in-time compilation: Setting BUILD_EXT=1 compiles CUDA extensions during installation (requires torch to be already installed). Without this flag, extensions are compiled on first use via just-in-time (JIT) compilation.
  • CUDA_DEVICE_MAX_CONNECTIONS: This environment variable is automatically set to 1 at ColossalAI initialization, which may affect other CUDA applications running in the same process.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment