Environment:Rapidsai Cuml CUDA GPU
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, GPU_Computing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
NVIDIA CUDA GPU environment with compute capability 7.0+ and CUDA Toolkit 12.x or 13.x, required for all cuML GPU-accelerated machine learning operations.
Description
This environment defines the hardware and CUDA software stack required to run cuML. cuML is a GPU-accelerated machine learning library that requires an NVIDIA GPU with the CUDA toolkit. The library supports CUDA 12.x (compute capability 7.0+, Volta and newer) and CUDA 13.x (compute capability 7.5+, Turing and newer). The GPU must have sufficient VRAM for the target workload. The CUDA toolkit must include development libraries: cudart, cublas, cusparse, cusolver, curand, and cufft.
Usage
This environment is required for all cuML operations. Every estimator (KMeans, DBSCAN, HDBSCAN, PCA, UMAP, t-SNE, Random Forest, ARIMA, etc.) performs computation on the GPU via CUDA. The cuml.accel accelerator module also requires this environment to transparently accelerate scikit-learn code on GPU.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu 20.04+) | WSL2 supported but lacks managed memory (UVM) |
| Hardware | NVIDIA GPU | Compute capability 7.0+ (CUDA 12.x) or 7.5+ (CUDA 13.x) |
| GPU Architectures | Volta, Turing, Ampere, Hopper, Blackwell | SM 70, 72, 75, 80, 86, 87, 89, 90, 120, 121 |
| CUDA Toolkit | >= 12.2, < 14.0 | Must include cublas, cufft, curand, cusolver, cusparse |
| CMake | >= 3.30.4 | Required for building from source |
Dependencies
System Packages
cuda-toolkit[cublas,cufft,curand,cusolver,cusparse]>= 12, < 14nvidia-drivercompatible with CUDA toolkit version- C++ compiler with C++17 support (for building from source)
Python Packages
cuda-python>= 12.9.2, < 14.0 (version depends on CUDA variant)cupy-cuda12xorcupy-cuda13x>= 13.6.0numba-cuda>= 0.22.1numba>= 0.60.0, < 0.62.0pylibraft== 26.4.*rmm== 26.4.* (RAPIDS Memory Manager)
Credentials
No credentials required for GPU operation. The following optional environment variables control behavior:
CUDA_VISIBLE_DEVICES: Controls which GPUs are visible to the process.CUML_ACCEL_ENABLED: Set to"1"or"true"to enable automatic sklearn acceleration.CUML_ACCEL_LOG_LEVEL: Set to"error","warn","info", or"debug"for accelerator logging.NVTX_BENCHMARK: Enables NVTX profiling annotations.
Quick Install
# Install with pip (CUDA 12.x)
pip install cuml-cu12
# Install with pip (CUDA 13.x)
pip install cuml-cu13
# Install with conda
conda install -c rapidsai -c conda-forge -c nvidia cuml cuda-version=12.9
Code Evidence
GPU architecture detection from cpp/include/cuml/fil/detail/gpu_introspection.hpp:25-30:
inline auto max_shared_mem_per_block(int device = 0)
{
auto result = int{};
RAFT_CUDA_TRY(cudaDeviceGetAttribute(
&result, cudaDevAttrMaxSharedMemoryPerBlockOptin, device));
return result;
}
UVM (Unified Virtual Memory) detection from python/cuml/cuml/accel/core.py:113-131:
def _is_concurrent_managed_access_supported():
"""Check the availability of concurrent managed access (UVM).
Note that WSL2 does not support managed memory."""
runtime.cudaFree(0) # Ensure CUDA is initialized
device_id = 0
err, supports_managed_access = runtime.cudaDeviceGetAttribute(
runtime.cudaDeviceAttr.cudaDevAttrConcurrentManagedAccess, device_id
)
if err != runtime.cudaError_t.cudaSuccess:
logger.error(f"Failed to check cudaDevAttrConcurrentManagedAccess with error {err}")
return False
return supports_managed_access != 0
Build requirements from BUILD.md:10-17:
GPU Compute Capability Constraints:
- CUDA 12.x: compute capability 7.0 or higher (Volta architecture or newer)
- CUDA 13.x: compute capability 7.5 or higher (Turing architecture or newer)
CUDA Toolkit (>= 12.2) - must include development libraries
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
ModuleNotFoundError: No module named 'libcuml' |
libcuml C++ library not installed | Install the full cuml package: pip install cuml-cu12
|
cudaErrorNoDevice |
No NVIDIA GPU detected | Ensure NVIDIA drivers are installed and GPU is accessible |
CUDA out of memory |
Insufficient GPU VRAM for operation | Reduce batch size, use max_mbytes_per_batch parameter, or use a GPU with more VRAM
|
Failed to check cudaDevAttrConcurrentManagedAccess |
UVM not supported (e.g., WSL2) | Pass disable_uvm=True to cuml.accel.install()
|
Compatibility Notes
- WSL2: Does not support CUDA Unified Virtual Memory (managed memory). The accelerator module detects this and skips UVM setup automatically.
- Compute Capability: CUDA 13.x drops support for SM 70 (Volta V100). If using CUDA 13.x, minimum is SM 75 (Turing).
- Multi-GPU: For distributed multi-GPU workflows, additional packages are needed (see
Rapidsai_Cuml_Dask_Distributedenvironment). - CPU-only: cuML Random Forest models can be exported and run on CPU-only machines via FIL (Forest Inference Library) with
precision='single'or'double'.
Related Pages
- Implementation:Rapidsai_Cuml_PCA_UMAP_TSNE_Configuration
- Implementation:Rapidsai_Cuml_Input_To_Cuml_Array
- Implementation:Rapidsai_Cuml_KMeans_DBSCAN_HDBSCAN_Init
- Implementation:Rapidsai_Cuml_KMeans_DBSCAN_HDBSCAN_Fit
- Implementation:Rapidsai_Cuml_Cluster_Predict
- Implementation:Rapidsai_Cuml_Cluster_Evaluation_Metrics