Environment:Hpcaitech ColossalAI CUDA GPU Environment

Knowledge Sources	ColossalAI PyTorch
Domains	Infrastructure, Deep_Learning
Last Updated	2026-02-09 03:00 GMT

Overview

ColossalAI requires an NVIDIA CUDA-capable GPU environment with a matching CUDA toolkit and PyTorch installation to build and run its distributed training framework.

Description

ColossalAI is a distributed deep learning system that relies heavily on NVIDIA CUDA for GPU-accelerated computation. The environment enforces strict compatibility between the system CUDA toolkit version and the CUDA version that PyTorch was compiled against. At build time, ColossalAI compiles custom CUDA kernel extensions (controlled by the BUILD_EXT environment variable) that target specific GPU compute capabilities ranging from Pascal (6.x) through Ampere (8.x). At runtime, the framework initializes with the NCCL communication backend and sets CUDA_DEVICE_MAX_CONNECTIONS to 1 for deterministic inter-device communication. Windows is not supported; the project raises a RuntimeError directing users to WSL if a Windows platform is detected.

Usage

Use this environment whenever deploying or developing against ColossalAI for distributed training, reinforcement learning from human feedback (RLHF), or large-model inference. This environment must be satisfied before any ColossalAI Implementation page (such as Booster, SFTTrainer, or DPOTrainer) can function correctly.

System Requirements

Requirement	Value	Notes
Operating System	Linux	Windows is explicitly unsupported; a RuntimeError is raised suggesting WSL (see `setup.py:19-20`)
Python	>= 3.6	Declared via `python_requires` in `setup.py:136`
NVIDIA GPU	Compute Capability >= 6.0	Pascal (6.x), Volta (7.0), Turing (7.5), Ampere (8.0, 8.6) are supported
CUDA Toolkit	Must match PyTorch CUDA version	Major version must match exactly; minor version mismatch produces a warning (see `extensions/utils.py:84-101`)
Communication Backend	NCCL	Default backend set in `colossalai/initialize.py:25` and `colossalai/accelerator/cuda_accelerator.py:19`

Dependencies

System Packages

NVIDIA Driver compatible with the installed CUDA toolkit version
CUDA Toolkit with nvcc compiler (path exposed via CUDA_HOME environment variable)
NCCL library for multi-GPU and multi-node communication

Python Packages

Package	Version Constraint	Source
`torch`	>= 2.2.0, <= 2.5.1	`requirements/requirements.txt`
`transformers`	== 4.51.3	`requirements/requirements.txt`
`peft`	>= 0.7.1, <= 0.13.2	`requirements/requirements.txt`

Note: The CUDA extension build code in extensions/cuda_extension.py:13-15 defines a legacy minimum of PyTorch 1.10, but the project-level requirements/requirements.txt enforces torch >= 2.2.0 as the effective minimum.

Credentials

ColossalAI uses several environment variables to control CUDA behavior. No API keys or secret credentials are required, but these variables must be set correctly.

Variable	Required	Description
CUDA_HOME	Yes (for extension builds)	Path to the CUDA toolkit installation (e.g., `/usr/local/cuda`). Checked in `extensions/cuda_extension.py:38-46` during CUDA extension compilation.
BUILD_EXT	No	Set to `1` to build CUDA extensions ahead-of-time during `pip install`. When set, `torch` must be importable at install time (see `setup.py:16, 70-93`).
FORCE_CUDA	No	When set, forces CUDA extension support even if `torch.cuda.is_available()` returns `False`. Useful for cross-compilation (see `extensions/cuda_extension.py:26-36`).
TORCH_CUDA_ARCH_LIST	No	Semicolon-separated list of CUDA compute capabilities to target (e.g., `7.0;7.5;8.0`). If unset, ColossalAI auto-detects from the current GPU (see `extensions/utils.py:154-190`).
CUDA_DEVICE_MAX_CONNECTIONS	Auto-set	Automatically set to `1` by `colossalai/initialize.py:10` at framework startup.

Quick Install

# Verify NVIDIA GPU and driver
nvidia-smi

# Set CUDA_HOME if not already configured
export CUDA_HOME=/usr/local/cuda

# Install ColossalAI with pre-built CUDA extensions
BUILD_EXT=1 pip install colossalai

# Or install without ahead-of-time extension compilation (extensions build on first use)
pip install colossalai

# Install pinned dependencies
pip install "torch>=2.2.0,<=2.5.1" "transformers==4.51.3" "peft>=0.7.1,<=0.13.2"

Code Evidence

Windows platform guard (setup.py:19-20):

if platform.system() == "Windows":
    raise RuntimeError("Windows is not supported. Please use WSL.")

CUDA availability and FORCE_CUDA check (extensions/cuda_extension.py:26-36):

def is_available(self) -> bool:
    # Check if CUDA is available
    try:
        import torch
        cuda_available = torch.cuda.is_available()
    except ImportError:
        cuda_available = False

    if not cuda_available and os.environ.get("FORCE_CUDA"):
        cuda_available = True
    return cuda_available

System CUDA and PyTorch CUDA version match (extensions/utils.py:84-101):

def check_system_pytorch_cuda_match(cuda_dir):
    system_cuda_version = get_cuda_version_from_exec(cuda_dir)
    torch_cuda_version = torch.version.cuda
    # major version must match
    if system_cuda_version.major != torch_cuda_version.major:
        raise RuntimeError(
            f"System CUDA {system_cuda_version} != PyTorch CUDA {torch_cuda_version}"
        )
    # minor version mismatch is a warning
    if system_cuda_version.minor != torch_cuda_version.minor:
        warnings.warn(...)

CUDA architecture list setup (extensions/utils.py:154-190):

def set_cuda_arch_list(cuda_dir):
    # Supports Pascal (6.x), Volta (7.0), Turing (7.5), Ampere (8.0, 8.6)
    # Requires compute capability >= 6.0
    ...

Common Errors

Error	Cause	Solution
`RuntimeError: Windows is not supported. Please use WSL.`	Running `setup.py` on a native Windows platform	Use Windows Subsystem for Linux (WSL) with an Ubuntu distribution and NVIDIA CUDA drivers for WSL
`RuntimeError: System CUDA version does not match PyTorch CUDA version`	The CUDA toolkit installed on the system has a different major version than the CUDA version PyTorch was compiled with	Install a CUDA toolkit whose major version matches PyTorch's CUDA version (check with `python -c "import torch; print(torch.version.cuda)"`)
`RuntimeError: PyTorch version is too old`	Installed PyTorch version is below the minimum required (2.2.0 per requirements, or 1.10 per legacy extension code)	Upgrade PyTorch: `pip install "torch>=2.2.0,<=2.5.1"`
`CUDA_HOME environment variable is not set`	CUDA_HOME is not defined and the build system cannot locate `nvcc`	Set `export CUDA_HOME=/usr/local/cuda` (or the correct path to your CUDA installation)
`No CUDA runtime is found`	No NVIDIA GPU detected and FORCE_CUDA is not set	Ensure an NVIDIA GPU is available and drivers are installed, or set `export FORCE_CUDA=1` for cross-compilation
CUDA extension build fails with unsupported architecture	Target GPU has compute capability below 6.0	ColossalAI requires Pascal-generation (compute capability 6.0) or newer GPUs. Upgrade hardware or set TORCH_CUDA_ARCH_LIST to a supported value.

Compatibility Notes

Python version: The python_requires field specifies >= 3.6, but practical compatibility depends on the PyTorch version installed. PyTorch >= 2.2.0 itself requires Python >= 3.8.
CUDA architecture: Only NVIDIA GPUs with compute capability >= 6.0 are supported. This includes Pascal (GTX 10xx, P100), Volta (V100), Turing (RTX 20xx, T4), and Ampere (RTX 30xx, A100) generation cards.
NCCL backend: The default communication backend is NCCL, which requires all participating nodes to have NVIDIA GPUs. Alternative backends are not configured by default.
CUDA version matching: The system CUDA toolkit major version must match the PyTorch CUDA major version exactly. A minor version mismatch produces a warning but does not prevent operation.
Ahead-of-time vs just-in-time compilation: Setting BUILD_EXT=1 compiles CUDA extensions during installation (requires torch to be already installed). Without this flag, extensions are compiled on first use via just-in-time (JIT) compilation.
CUDA_DEVICE_MAX_CONNECTIONS: This environment variable is automatically set to 1 at ColossalAI initialization, which may affect other CUDA applications running in the same process.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment