Environment:Hpcaitech ColossalAI CUDA GPU Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deep_Learning |
| Last Updated | 2026-02-09 03:00 GMT |
Overview
ColossalAI requires an NVIDIA CUDA-capable GPU environment with a matching CUDA toolkit and PyTorch installation to build and run its distributed training framework.
Description
ColossalAI is a distributed deep learning system that relies heavily on NVIDIA CUDA for GPU-accelerated computation. The environment enforces strict compatibility between the system CUDA toolkit version and the CUDA version that PyTorch was compiled against. At build time, ColossalAI compiles custom CUDA kernel extensions (controlled by the BUILD_EXT environment variable) that target specific GPU compute capabilities ranging from Pascal (6.x) through Ampere (8.x). At runtime, the framework initializes with the NCCL communication backend and sets CUDA_DEVICE_MAX_CONNECTIONS to 1 for deterministic inter-device communication. Windows is not supported; the project raises a RuntimeError directing users to WSL if a Windows platform is detected.
Usage
Use this environment whenever deploying or developing against ColossalAI for distributed training, reinforcement learning from human feedback (RLHF), or large-model inference. This environment must be satisfied before any ColossalAI Implementation page (such as Booster, SFTTrainer, or DPOTrainer) can function correctly.
System Requirements
| Requirement | Value | Notes |
|---|---|---|
| Operating System | Linux | Windows is explicitly unsupported; a RuntimeError is raised suggesting WSL (see setup.py:19-20)
|
| Python | >= 3.6 | Declared via python_requires in setup.py:136
|
| NVIDIA GPU | Compute Capability >= 6.0 | Pascal (6.x), Volta (7.0), Turing (7.5), Ampere (8.0, 8.6) are supported |
| CUDA Toolkit | Must match PyTorch CUDA version | Major version must match exactly; minor version mismatch produces a warning (see extensions/utils.py:84-101)
|
| Communication Backend | NCCL | Default backend set in colossalai/initialize.py:25 and colossalai/accelerator/cuda_accelerator.py:19
|
Dependencies
System Packages
- NVIDIA Driver compatible with the installed CUDA toolkit version
- CUDA Toolkit with
nvcccompiler (path exposed via CUDA_HOME environment variable) - NCCL library for multi-GPU and multi-node communication
Python Packages
| Package | Version Constraint | Source |
|---|---|---|
torch |
>= 2.2.0, <= 2.5.1 | requirements/requirements.txt
|
transformers |
== 4.51.3 | requirements/requirements.txt
|
peft |
>= 0.7.1, <= 0.13.2 | requirements/requirements.txt
|
Note: The CUDA extension build code in extensions/cuda_extension.py:13-15 defines a legacy minimum of PyTorch 1.10, but the project-level requirements/requirements.txt enforces torch >= 2.2.0 as the effective minimum.
Credentials
ColossalAI uses several environment variables to control CUDA behavior. No API keys or secret credentials are required, but these variables must be set correctly.
| Variable | Required | Description |
|---|---|---|
| CUDA_HOME | Yes (for extension builds) | Path to the CUDA toolkit installation (e.g., /usr/local/cuda). Checked in extensions/cuda_extension.py:38-46 during CUDA extension compilation.
|
| BUILD_EXT | No | Set to 1 to build CUDA extensions ahead-of-time during pip install. When set, torch must be importable at install time (see setup.py:16, 70-93).
|
| FORCE_CUDA | No | When set, forces CUDA extension support even if torch.cuda.is_available() returns False. Useful for cross-compilation (see extensions/cuda_extension.py:26-36).
|
| TORCH_CUDA_ARCH_LIST | No | Semicolon-separated list of CUDA compute capabilities to target (e.g., 7.0;7.5;8.0). If unset, ColossalAI auto-detects from the current GPU (see extensions/utils.py:154-190).
|
| CUDA_DEVICE_MAX_CONNECTIONS | Auto-set | Automatically set to 1 by colossalai/initialize.py:10 at framework startup.
|
Quick Install
# Verify NVIDIA GPU and driver
nvidia-smi
# Set CUDA_HOME if not already configured
export CUDA_HOME=/usr/local/cuda
# Install ColossalAI with pre-built CUDA extensions
BUILD_EXT=1 pip install colossalai
# Or install without ahead-of-time extension compilation (extensions build on first use)
pip install colossalai
# Install pinned dependencies
pip install "torch>=2.2.0,<=2.5.1" "transformers==4.51.3" "peft>=0.7.1,<=0.13.2"
Code Evidence
Windows platform guard (setup.py:19-20):
if platform.system() == "Windows":
raise RuntimeError("Windows is not supported. Please use WSL.")
CUDA availability and FORCE_CUDA check (extensions/cuda_extension.py:26-36):
def is_available(self) -> bool:
# Check if CUDA is available
try:
import torch
cuda_available = torch.cuda.is_available()
except ImportError:
cuda_available = False
if not cuda_available and os.environ.get("FORCE_CUDA"):
cuda_available = True
return cuda_available
System CUDA and PyTorch CUDA version match (extensions/utils.py:84-101):
def check_system_pytorch_cuda_match(cuda_dir):
system_cuda_version = get_cuda_version_from_exec(cuda_dir)
torch_cuda_version = torch.version.cuda
# major version must match
if system_cuda_version.major != torch_cuda_version.major:
raise RuntimeError(
f"System CUDA {system_cuda_version} != PyTorch CUDA {torch_cuda_version}"
)
# minor version mismatch is a warning
if system_cuda_version.minor != torch_cuda_version.minor:
warnings.warn(...)
CUDA architecture list setup (extensions/utils.py:154-190):
def set_cuda_arch_list(cuda_dir):
# Supports Pascal (6.x), Volta (7.0), Turing (7.5), Ampere (8.0, 8.6)
# Requires compute capability >= 6.0
...
Common Errors
| Error | Cause | Solution |
|---|---|---|
RuntimeError: Windows is not supported. Please use WSL. |
Running setup.py on a native Windows platform |
Use Windows Subsystem for Linux (WSL) with an Ubuntu distribution and NVIDIA CUDA drivers for WSL |
RuntimeError: System CUDA version does not match PyTorch CUDA version |
The CUDA toolkit installed on the system has a different major version than the CUDA version PyTorch was compiled with | Install a CUDA toolkit whose major version matches PyTorch's CUDA version (check with python -c "import torch; print(torch.version.cuda)")
|
RuntimeError: PyTorch version is too old |
Installed PyTorch version is below the minimum required (2.2.0 per requirements, or 1.10 per legacy extension code) | Upgrade PyTorch: pip install "torch>=2.2.0,<=2.5.1"
|
CUDA_HOME environment variable is not set |
CUDA_HOME is not defined and the build system cannot locate nvcc |
Set export CUDA_HOME=/usr/local/cuda (or the correct path to your CUDA installation)
|
No CUDA runtime is found |
No NVIDIA GPU detected and FORCE_CUDA is not set | Ensure an NVIDIA GPU is available and drivers are installed, or set export FORCE_CUDA=1 for cross-compilation
|
| CUDA extension build fails with unsupported architecture | Target GPU has compute capability below 6.0 | ColossalAI requires Pascal-generation (compute capability 6.0) or newer GPUs. Upgrade hardware or set TORCH_CUDA_ARCH_LIST to a supported value. |
Compatibility Notes
- Python version: The
python_requiresfield specifies >= 3.6, but practical compatibility depends on the PyTorch version installed. PyTorch >= 2.2.0 itself requires Python >= 3.8. - CUDA architecture: Only NVIDIA GPUs with compute capability >= 6.0 are supported. This includes Pascal (GTX 10xx, P100), Volta (V100), Turing (RTX 20xx, T4), and Ampere (RTX 30xx, A100) generation cards.
- NCCL backend: The default communication backend is NCCL, which requires all participating nodes to have NVIDIA GPUs. Alternative backends are not configured by default.
- CUDA version matching: The system CUDA toolkit major version must match the PyTorch CUDA major version exactly. A minor version mismatch produces a warning but does not prevent operation.
- Ahead-of-time vs just-in-time compilation: Setting BUILD_EXT=1 compiles CUDA extensions during installation (requires
torchto be already installed). Without this flag, extensions are compiled on first use via just-in-time (JIT) compilation. - CUDA_DEVICE_MAX_CONNECTIONS: This environment variable is automatically set to
1at ColossalAI initialization, which may affect other CUDA applications running in the same process.
Related Pages
- Implementation:Hpcaitech_ColossalAI_Launch_From_Torch
- Implementation:Hpcaitech_ColossalAI_Booster
- Implementation:Hpcaitech_ColossalAI_HybridAdam_CosineScheduler
- Implementation:Hpcaitech_ColossalAI_SFTTrainer
- Implementation:Hpcaitech_ColossalAI_DPOTrainer
- Implementation:Hpcaitech_ColossalAI_Booster_Training_Loop
- Implementation:Hpcaitech_ColossalAI_HuggingFaceModel_Inference