Environment:NVIDIA NeMo Curator RAPIDS GPU Stack
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, GPU_Computing, Deduplication |
| Last Updated | 2026-02-14 16:45 GMT |
Overview
NVIDIA RAPIDS GPU-accelerated stack (cuDF, cuML, cuPy, pylibcugraph) with CUDA 12 for running GPU deduplication and embedding pipelines.
Description
This environment provides the GPU-accelerated RAPIDS libraries required by NeMo Curator's deduplication stages. The MinHash, LSH, Connected Components, KMeans clustering, and Pairwise similarity stages all directly import `cudf`, `cupy`, `pylibcugraph`, and `rmm` without fallback. These are hard requirements for GPU-based deduplication — the stages will fail with `ImportError` if RAPIDS is not installed. The stack is pinned to the CUDA 12 / RAPIDS 25.10 release line.
Usage
Use this environment for any GPU-accelerated deduplication workflow: exact deduplication, fuzzy (MinHash/LSH) deduplication, and semantic deduplication. Also required for GPU-based text embedding generation using cuDF DataFrames.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | Required by both NeMo Curator and RAPIDS |
| Hardware | NVIDIA GPU with CUDA 12 support | Ampere (A100) or newer recommended |
| VRAM | 16GB+ recommended | Connected components and pairwise stages are memory-intensive |
| CUDA | CUDA 12.x toolkit | Required by cuDF-cu12, cuML-cu12 packages |
| Driver | NVIDIA driver >= 525 | Required for CUDA 12 compatibility |
Dependencies
System Packages
- CUDA 12.x toolkit
- NVIDIA driver >= 525
Python Packages
- `cudf-cu12` == 25.10.*
- `cuml-cu12` == 25.10.*
- `scikit-learn` < 1.8.0 (cuml 25.10 incompatible with sklearn 1.8.0)
- `pylibcugraph-cu12` == 25.10.*
- `pylibraft-cu12` == 25.10.*
- `raft-dask-cu12` == 25.10.*
- `rapidsmpf-cu12` == 25.10.*
- `gpustat` (optional, for GPU monitoring)
- `nvidia-ml-py` (optional, for pynvml GPU detection)
Credentials
No additional credentials required beyond the base environment.
Quick Install
# Install NeMo Curator with RAPIDS GPU deduplication support
pip install "nemo-curator[deduplication_cuda12]"
# Or for full text curation with GPU
pip install "nemo-curator[text_cuda12]"
Code Evidence
Direct cuDF import (no fallback) from `nemo_curator/stages/deduplication/fuzzy/minhash.py:18-20`:
import cudf
import numpy as np
import rmm
Direct pylibcugraph import from `nemo_curator/stages/deduplication/fuzzy/connected_components.py:18-22`:
import cudf
from loguru import logger
from pylibcugraph import GraphProperties, MGGraph, ResourceHandle
from pylibcugraph import weakly_connected_components as pylibcugraph_wcc
from pylibcugraph.comms.comms_wrapper import init_subcomms as c_init_subcomms
Direct cupy import from `nemo_curator/stages/deduplication/semantic/pairwise.py:20-23`:
import cudf
import cupy
scikit-learn version constraint from `pyproject.toml:79`:
"scikit-learn<1.8.0", # cuml 25.10.0 is incompatible with scikit-learn 1.8.0
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ModuleNotFoundError: No module named 'cudf'` | RAPIDS cuDF not installed | `pip install cudf-cu12==25.10.*` |
| `ModuleNotFoundError: No module named 'pylibcugraph'` | Graph library not installed | `pip install pylibcugraph-cu12==25.10.*` |
| `ImportError: libcuda.so` | NVIDIA driver not found | Install NVIDIA driver >= 525 |
| `CUDA out of memory` during connected components | Insufficient GPU VRAM | Reduce input blocksize or use a GPU with more VRAM |
| `sklearn` version conflict | scikit-learn 1.8+ installed | `pip install "scikit-learn<1.8.0"` |
Compatibility Notes
- RAPIDS version pinning: All RAPIDS packages must be from the same 25.10 release. Mixing versions causes ABI incompatibilities.
- CUDA 11: Not supported. NeMo Curator requires CUDA 12 packages.
- AMD GPUs: Not supported. RAPIDS libraries are NVIDIA-only.
- CPU fallback: Deduplication stages have no CPU fallback. For CPU-only environments, these stages cannot run.
Related Pages
- Implementation:NVIDIA_NeMo_Curator_MinHashStage
- Implementation:NVIDIA_NeMo_Curator_LSHStage
- Implementation:NVIDIA_NeMo_Curator_ConnectedComponentsStage
- Implementation:NVIDIA_NeMo_Curator_BucketsToEdgesStage
- Implementation:NVIDIA_NeMo_Curator_KMeansStage
- Implementation:NVIDIA_NeMo_Curator_PairwiseStage
- Implementation:NVIDIA_NeMo_Curator_Semantic_IdentifyDuplicatesStage
- Implementation:NVIDIA_NeMo_Curator_Fuzzy_IdentifyDuplicatesStage
- Implementation:NVIDIA_NeMo_Curator_FuzzyDeduplicationWorkflow