Environment:Rapidsai Cuml Dask Distributed

Knowledge Sources	cuML cuML Dask Docs
Domains	Infrastructure, Distributed_Computing
Last Updated	2026-02-08 00:00 GMT

Overview

Multi-GPU distributed computing environment using Dask, dask-cuda, dask-cudf, and RAFT distributed communication for scaling cuML across multiple GPUs and nodes.

Description

This environment extends the base cuML CUDA GPU environment with distributed computing capabilities. It uses Dask for task scheduling across multiple GPU workers, dask-cuda for GPU-aware cluster management, dask-cudf for distributed GPU DataFrames, and raft-dask for NCCL/UCX-based inter-GPU communication. The environment supports both single-node multi-GPU and multi-node configurations.

Usage

Use this environment when running cuML distributed algorithms via cuml.dask. This is required for multi-GPU KMeans, NearestNeighbors, PCA, TruncatedSVD, LinearRegression, Ridge, Lasso, ElasticNet, LogisticRegression, DBSCAN, and distributed Random Forest training. Single-GPU cuML does not require this environment.

System Requirements

Category	Requirement	Notes
Hardware	Multiple NVIDIA GPUs	One Dask worker spawned per GPU
Networking	NVLink (recommended)	For high-bandwidth GPU-to-GPU transfer on single node
Networking	InfiniBand (optional)	For multi-node GPU-to-GPU transfer
Protocol	UCX (recommended)	GPU-direct RDMA for optimal performance

Dependencies

Additional Packages (beyond base cuML)

dask-cudf == 26.4.*
raft-dask == 26.4.*
rapids-dask-dependency == 26.4.*
dask-cuda == 26.4.* (for LocalCUDACluster)
dask-ml >= 2024 (optional, for testing)

External Libraries

UCX library (optional but recommended for GPU-direct networking)
NCCL (included with CUDA toolkit, used by RAFT Comms)

Credentials

CUDA_VISIBLE_DEVICES: Controls which GPUs are assigned to Dask workers.
DASK_DISTRIBUTED__COMM__UCX__TCP: UCX over TCP configuration.
UCX_TLS: UCX transport layer selection.

Quick Install

# Install cuML with Dask extras (CUDA 12.x)
pip install cuml-cu12[dask]

# Install with conda
conda install -c rapidsai -c conda-forge -c nvidia cuml dask-cuda dask-cudf raft-dask

Code Evidence

Dask dependency validation from python/cuml/cuml/dask/__init__.py:5-25:

try:
    import dask
    import dask.distributed as _
    import dask_cudf as _
    import raft_dask as _
    del _
except ModuleNotFoundError as exc:
    from cupy.cuda import get_local_runtime_version
    ver = f"cu{str(get_local_runtime_version())[:2]}"
    raise ModuleNotFoundError(
        f"{exc!s}\n\n"
        "Not all requirements for using `cuml.dask` are installed.\n\n"
        "# Install with pip:\n"
        f"  pip install cuml-{ver}[dask]"
    ) from exc

Dask shuffle workaround from python/cuml/cuml/dask/__init__.py:43-44:

# Avoid "p2p" shuffling in dask for now
dask.config.set({"dataframe.shuffle.method": "tasks"})

Dask version compatibility from python/cuml/cuml/dask/_compat.py:7-15:

import packaging.version
@functools.lru_cache
def DASK_2025_4_0():
    return packaging.version.parse(dask.__version__) >= packaging.version.parse("2025.4.0")

Common Errors

Error Message	Cause	Solution
`ModuleNotFoundError: No module named 'dask_cudf'`	Dask extras not installed	`pip install cuml-cu12[dask]`
`ModuleNotFoundError: No module named 'raft_dask'`	raft-dask not installed	`pip install raft-dask==26.4.*`
UCX connection errors	UCX library not installed or misconfigured	Fall back to TCP protocol: `LocalCUDACluster(protocol='tcp')`
Worker GPU memory errors	Data not evenly distributed	Ensure one partition per worker; use `persist_across_workers()`

Compatibility Notes

P2P Shuffle: cuML forces Dask to use task-based shuffling instead of P2P (dask.config.set({"dataframe.shuffle.method": "tasks"})). P2P shuffling has known issues with GPU data.
UCX vs TCP: UCX protocol provides significantly better performance through GPU-direct RDMA but requires additional system library installation. TCP works out of the box.
NVLink: Enable with LocalCUDACluster(enable_nvlink=True) for single-node multi-GPU. Provides highest bandwidth between GPUs on the same node.
Dask Version: The _compat.py module tracks Dask API changes across versions, currently checking for Dask >= 2025.4.0.

Related Pages

No implementation pages currently reference this environment.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment