Environment:Rapidsai Cuml CUDA GPU

Knowledge Sources	cuML BUILD.md cuML Docs
Domains	Infrastructure, GPU_Computing
Last Updated	2026-02-08 00:00 GMT

Overview

NVIDIA CUDA GPU environment with compute capability 7.0+ and CUDA Toolkit 12.x or 13.x, required for all cuML GPU-accelerated machine learning operations.

Description

This environment defines the hardware and CUDA software stack required to run cuML. cuML is a GPU-accelerated machine learning library that requires an NVIDIA GPU with the CUDA toolkit. The library supports CUDA 12.x (compute capability 7.0+, Volta and newer) and CUDA 13.x (compute capability 7.5+, Turing and newer). The GPU must have sufficient VRAM for the target workload. The CUDA toolkit must include development libraries: cudart, cublas, cusparse, cusolver, curand, and cufft.

Usage

This environment is required for all cuML operations. Every estimator (KMeans, DBSCAN, HDBSCAN, PCA, UMAP, t-SNE, Random Forest, ARIMA, etc.) performs computation on the GPU via CUDA. The cuml.accel accelerator module also requires this environment to transparently accelerate scikit-learn code on GPU.

System Requirements

Category	Requirement	Notes
OS	Linux (Ubuntu 20.04+)	WSL2 supported but lacks managed memory (UVM)
Hardware	NVIDIA GPU	Compute capability 7.0+ (CUDA 12.x) or 7.5+ (CUDA 13.x)
GPU Architectures	Volta, Turing, Ampere, Hopper, Blackwell	SM 70, 72, 75, 80, 86, 87, 89, 90, 120, 121
CUDA Toolkit	>= 12.2, < 14.0	Must include cublas, cufft, curand, cusolver, cusparse
CMake	>= 3.30.4	Required for building from source

Dependencies

System Packages

cuda-toolkit[cublas,cufft,curand,cusolver,cusparse] >= 12, < 14
nvidia-driver compatible with CUDA toolkit version
C++ compiler with C++17 support (for building from source)

Python Packages

cuda-python >= 12.9.2, < 14.0 (version depends on CUDA variant)
cupy-cuda12x or cupy-cuda13x >= 13.6.0
numba-cuda >= 0.22.1
numba >= 0.60.0, < 0.62.0
pylibraft == 26.4.*
rmm == 26.4.* (RAPIDS Memory Manager)

Credentials

No credentials required for GPU operation. The following optional environment variables control behavior:

CUDA_VISIBLE_DEVICES: Controls which GPUs are visible to the process.
CUML_ACCEL_ENABLED: Set to "1" or "true" to enable automatic sklearn acceleration.
CUML_ACCEL_LOG_LEVEL: Set to "error", "warn", "info", or "debug" for accelerator logging.
NVTX_BENCHMARK: Enables NVTX profiling annotations.

Quick Install

# Install with pip (CUDA 12.x)
pip install cuml-cu12

# Install with pip (CUDA 13.x)
pip install cuml-cu13

# Install with conda
conda install -c rapidsai -c conda-forge -c nvidia cuml cuda-version=12.9

Code Evidence

GPU architecture detection from cpp/include/cuml/fil/detail/gpu_introspection.hpp:25-30:

inline auto max_shared_mem_per_block(int device = 0)
{
  auto result = int{};
  RAFT_CUDA_TRY(cudaDeviceGetAttribute(
    &result, cudaDevAttrMaxSharedMemoryPerBlockOptin, device));
  return result;
}

UVM (Unified Virtual Memory) detection from python/cuml/cuml/accel/core.py:113-131:

def _is_concurrent_managed_access_supported():
    """Check the availability of concurrent managed access (UVM).
    Note that WSL2 does not support managed memory."""
    runtime.cudaFree(0)  # Ensure CUDA is initialized
    device_id = 0
    err, supports_managed_access = runtime.cudaDeviceGetAttribute(
        runtime.cudaDeviceAttr.cudaDevAttrConcurrentManagedAccess, device_id
    )
    if err != runtime.cudaError_t.cudaSuccess:
        logger.error(f"Failed to check cudaDevAttrConcurrentManagedAccess with error {err}")
        return False
    return supports_managed_access != 0

Build requirements from BUILD.md:10-17:

GPU Compute Capability Constraints:
- CUDA 12.x: compute capability 7.0 or higher (Volta architecture or newer)
- CUDA 13.x: compute capability 7.5 or higher (Turing architecture or newer)
CUDA Toolkit (>= 12.2) - must include development libraries

Common Errors

Error Message	Cause	Solution
`ModuleNotFoundError: No module named 'libcuml'`	libcuml C++ library not installed	Install the full cuml package: `pip install cuml-cu12`
`cudaErrorNoDevice`	No NVIDIA GPU detected	Ensure NVIDIA drivers are installed and GPU is accessible
`CUDA out of memory`	Insufficient GPU VRAM for operation	Reduce batch size, use `max_mbytes_per_batch` parameter, or use a GPU with more VRAM
`Failed to check cudaDevAttrConcurrentManagedAccess`	UVM not supported (e.g., WSL2)	Pass `disable_uvm=True` to `cuml.accel.install()`

Compatibility Notes

WSL2: Does not support CUDA Unified Virtual Memory (managed memory). The accelerator module detects this and skips UVM setup automatically.
Compute Capability: CUDA 13.x drops support for SM 70 (Volta V100). If using CUDA 13.x, minimum is SM 75 (Turing).
Multi-GPU: For distributed multi-GPU workflows, additional packages are needed (see Rapidsai_Cuml_Dask_Distributed environment).
CPU-only: cuML Random Forest models can be exported and run on CPU-only machines via FIL (Forest Inference Library) with precision='single' or 'double'.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment