Environment:Rapidsai Cuml Dask Distributed
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Distributed_Computing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Multi-GPU distributed computing environment using Dask, dask-cuda, dask-cudf, and RAFT distributed communication for scaling cuML across multiple GPUs and nodes.
Description
This environment extends the base cuML CUDA GPU environment with distributed computing capabilities. It uses Dask for task scheduling across multiple GPU workers, dask-cuda for GPU-aware cluster management, dask-cudf for distributed GPU DataFrames, and raft-dask for NCCL/UCX-based inter-GPU communication. The environment supports both single-node multi-GPU and multi-node configurations.
Usage
Use this environment when running cuML distributed algorithms via cuml.dask. This is required for multi-GPU KMeans, NearestNeighbors, PCA, TruncatedSVD, LinearRegression, Ridge, Lasso, ElasticNet, LogisticRegression, DBSCAN, and distributed Random Forest training. Single-GPU cuML does not require this environment.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Hardware | Multiple NVIDIA GPUs | One Dask worker spawned per GPU |
| Networking | NVLink (recommended) | For high-bandwidth GPU-to-GPU transfer on single node |
| Networking | InfiniBand (optional) | For multi-node GPU-to-GPU transfer |
| Protocol | UCX (recommended) | GPU-direct RDMA for optimal performance |
Dependencies
Additional Packages (beyond base cuML)
dask-cudf== 26.4.*raft-dask== 26.4.*rapids-dask-dependency== 26.4.*dask-cuda== 26.4.* (forLocalCUDACluster)dask-ml>= 2024 (optional, for testing)
External Libraries
UCXlibrary (optional but recommended for GPU-direct networking)NCCL(included with CUDA toolkit, used by RAFT Comms)
Credentials
CUDA_VISIBLE_DEVICES: Controls which GPUs are assigned to Dask workers.DASK_DISTRIBUTED__COMM__UCX__TCP: UCX over TCP configuration.UCX_TLS: UCX transport layer selection.
Quick Install
# Install cuML with Dask extras (CUDA 12.x)
pip install cuml-cu12[dask]
# Install with conda
conda install -c rapidsai -c conda-forge -c nvidia cuml dask-cuda dask-cudf raft-dask
Code Evidence
Dask dependency validation from python/cuml/cuml/dask/__init__.py:5-25:
try:
import dask
import dask.distributed as _
import dask_cudf as _
import raft_dask as _
del _
except ModuleNotFoundError as exc:
from cupy.cuda import get_local_runtime_version
ver = f"cu{str(get_local_runtime_version())[:2]}"
raise ModuleNotFoundError(
f"{exc!s}\n\n"
"Not all requirements for using `cuml.dask` are installed.\n\n"
"# Install with pip:\n"
f" pip install cuml-{ver}[dask]"
) from exc
Dask shuffle workaround from python/cuml/cuml/dask/__init__.py:43-44:
# Avoid "p2p" shuffling in dask for now
dask.config.set({"dataframe.shuffle.method": "tasks"})
Dask version compatibility from python/cuml/cuml/dask/_compat.py:7-15:
import packaging.version
@functools.lru_cache
def DASK_2025_4_0():
return packaging.version.parse(dask.__version__) >= packaging.version.parse("2025.4.0")
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
ModuleNotFoundError: No module named 'dask_cudf' |
Dask extras not installed | pip install cuml-cu12[dask]
|
ModuleNotFoundError: No module named 'raft_dask' |
raft-dask not installed | pip install raft-dask==26.4.*
|
| UCX connection errors | UCX library not installed or misconfigured | Fall back to TCP protocol: LocalCUDACluster(protocol='tcp')
|
| Worker GPU memory errors | Data not evenly distributed | Ensure one partition per worker; use persist_across_workers()
|
Compatibility Notes
- P2P Shuffle: cuML forces Dask to use task-based shuffling instead of P2P (
dask.config.set({"dataframe.shuffle.method": "tasks"})). P2P shuffling has known issues with GPU data. - UCX vs TCP: UCX protocol provides significantly better performance through GPU-direct RDMA but requires additional system library installation. TCP works out of the box.
- NVLink: Enable with
LocalCUDACluster(enable_nvlink=True)for single-node multi-GPU. Provides highest bandwidth between GPUs on the same node. - Dask Version: The
_compat.pymodule tracks Dask API changes across versions, currently checking for Dask >= 2025.4.0.
Related Pages
No implementation pages currently reference this environment.