Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Rapidsai Cuml CUDA GPU

From Leeroopedia


Knowledge Sources
Domains Infrastructure, GPU_Computing
Last Updated 2026-02-08 00:00 GMT

Overview

NVIDIA CUDA GPU environment with compute capability 7.0+ and CUDA Toolkit 12.x or 13.x, required for all cuML GPU-accelerated machine learning operations.

Description

This environment defines the hardware and CUDA software stack required to run cuML. cuML is a GPU-accelerated machine learning library that requires an NVIDIA GPU with the CUDA toolkit. The library supports CUDA 12.x (compute capability 7.0+, Volta and newer) and CUDA 13.x (compute capability 7.5+, Turing and newer). The GPU must have sufficient VRAM for the target workload. The CUDA toolkit must include development libraries: cudart, cublas, cusparse, cusolver, curand, and cufft.

Usage

This environment is required for all cuML operations. Every estimator (KMeans, DBSCAN, HDBSCAN, PCA, UMAP, t-SNE, Random Forest, ARIMA, etc.) performs computation on the GPU via CUDA. The cuml.accel accelerator module also requires this environment to transparently accelerate scikit-learn code on GPU.

System Requirements

Category Requirement Notes
OS Linux (Ubuntu 20.04+) WSL2 supported but lacks managed memory (UVM)
Hardware NVIDIA GPU Compute capability 7.0+ (CUDA 12.x) or 7.5+ (CUDA 13.x)
GPU Architectures Volta, Turing, Ampere, Hopper, Blackwell SM 70, 72, 75, 80, 86, 87, 89, 90, 120, 121
CUDA Toolkit >= 12.2, < 14.0 Must include cublas, cufft, curand, cusolver, cusparse
CMake >= 3.30.4 Required for building from source

Dependencies

System Packages

  • cuda-toolkit[cublas,cufft,curand,cusolver,cusparse] >= 12, < 14
  • nvidia-driver compatible with CUDA toolkit version
  • C++ compiler with C++17 support (for building from source)

Python Packages

  • cuda-python >= 12.9.2, < 14.0 (version depends on CUDA variant)
  • cupy-cuda12x or cupy-cuda13x >= 13.6.0
  • numba-cuda >= 0.22.1
  • numba >= 0.60.0, < 0.62.0
  • pylibraft == 26.4.*
  • rmm == 26.4.* (RAPIDS Memory Manager)

Credentials

No credentials required for GPU operation. The following optional environment variables control behavior:

  • CUDA_VISIBLE_DEVICES: Controls which GPUs are visible to the process.
  • CUML_ACCEL_ENABLED: Set to "1" or "true" to enable automatic sklearn acceleration.
  • CUML_ACCEL_LOG_LEVEL: Set to "error", "warn", "info", or "debug" for accelerator logging.
  • NVTX_BENCHMARK: Enables NVTX profiling annotations.

Quick Install

# Install with pip (CUDA 12.x)
pip install cuml-cu12

# Install with pip (CUDA 13.x)
pip install cuml-cu13

# Install with conda
conda install -c rapidsai -c conda-forge -c nvidia cuml cuda-version=12.9

Code Evidence

GPU architecture detection from cpp/include/cuml/fil/detail/gpu_introspection.hpp:25-30:

inline auto max_shared_mem_per_block(int device = 0)
{
  auto result = int{};
  RAFT_CUDA_TRY(cudaDeviceGetAttribute(
    &result, cudaDevAttrMaxSharedMemoryPerBlockOptin, device));
  return result;
}

UVM (Unified Virtual Memory) detection from python/cuml/cuml/accel/core.py:113-131:

def _is_concurrent_managed_access_supported():
    """Check the availability of concurrent managed access (UVM).
    Note that WSL2 does not support managed memory."""
    runtime.cudaFree(0)  # Ensure CUDA is initialized
    device_id = 0
    err, supports_managed_access = runtime.cudaDeviceGetAttribute(
        runtime.cudaDeviceAttr.cudaDevAttrConcurrentManagedAccess, device_id
    )
    if err != runtime.cudaError_t.cudaSuccess:
        logger.error(f"Failed to check cudaDevAttrConcurrentManagedAccess with error {err}")
        return False
    return supports_managed_access != 0

Build requirements from BUILD.md:10-17:

GPU Compute Capability Constraints:
- CUDA 12.x: compute capability 7.0 or higher (Volta architecture or newer)
- CUDA 13.x: compute capability 7.5 or higher (Turing architecture or newer)
CUDA Toolkit (>= 12.2) - must include development libraries

Common Errors

Error Message Cause Solution
ModuleNotFoundError: No module named 'libcuml' libcuml C++ library not installed Install the full cuml package: pip install cuml-cu12
cudaErrorNoDevice No NVIDIA GPU detected Ensure NVIDIA drivers are installed and GPU is accessible
CUDA out of memory Insufficient GPU VRAM for operation Reduce batch size, use max_mbytes_per_batch parameter, or use a GPU with more VRAM
Failed to check cudaDevAttrConcurrentManagedAccess UVM not supported (e.g., WSL2) Pass disable_uvm=True to cuml.accel.install()

Compatibility Notes

  • WSL2: Does not support CUDA Unified Virtual Memory (managed memory). The accelerator module detects this and skips UVM setup automatically.
  • Compute Capability: CUDA 13.x drops support for SM 70 (Volta V100). If using CUDA 13.x, minimum is SM 75 (Turing).
  • Multi-GPU: For distributed multi-GPU workflows, additional packages are needed (see Rapidsai_Cuml_Dask_Distributed environment).
  • CPU-only: cuML Random Forest models can be exported and run on CPU-only machines via FIL (Forest Inference Library) with precision='single' or 'double'.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment