Environment:NVIDIA NeMo Curator Python Linux Base
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Data_Curation |
| Last Updated | 2026-02-14 16:45 GMT |
Overview
Linux-only Python 3.10-3.12 environment with core dependencies for running NeMo Curator data curation pipelines.
Description
NeMo Curator requires Linux and will raise a `ValueError` at import time on any other platform. The core runtime depends on Python 3.10 through 3.12, Ray for distributed execution, PyTorch, HuggingFace Transformers, and the Cosmos-Xenna framework. This is the minimum environment needed to import `nemo_curator` and run CPU-based pipeline stages.
Usage
This environment is the mandatory base for all NeMo Curator operations. Every workflow (text, image, video, audio curation) requires this environment. GPU-accelerated features require additional environments layered on top of this base.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux only | `sys.platform == "linux"` enforced at import |
| Python | >= 3.10, < 3.13 | Python 3.10, 3.11, 3.12 supported |
| Architecture | x86_64 recommended | Some optional deps (vLLM, PyNvVideoCodec) are x86_64-only |
| Disk | 10GB+ SSD | For package installation and temp files |
Dependencies
System Packages
- `setuptools` >= 61.0 (build system)
Python Packages (Core)
- `absl-py` >= 2.0.0, < 3.0.0
- `comment_parser`
- `cosmos-xenna` == 0.1.2
- `fsspec`
- `hydra-core`
- `jieba` == 0.42.1
- `loguru`
- `mecab-python3`
- `omegaconf`
- `openai` >= 1.0.0
- `pandas` >= 2.1.0
- `pyarrow`
- `ray[default,data]` >= 2.50
- `torch`
- `transformers`
Credentials
No credentials are required for the base environment. Optional API keys for specific features:
- `NVIDIA_API_KEY`: Required for NVIDIA NIM-based synthetic data generation and benchmarking.
- `HF_TOKEN`: Required for accessing gated HuggingFace models (e.g., AEGIS safety classifier).
Quick Install
# Install base NeMo Curator
pip install nemo-curator
Code Evidence
Linux-only enforcement from `nemo_curator/__init__.py:41-48`:
if sys.platform != "linux":
_msg = (
"NeMo-Curator currently only supports Linux systems, "
f"while the current machine has a {sys.platform} system. \n"
"For more information on installation and system requirements, see "
"https://docs.nvidia.com/nemo/curator/latest/admin/installation.html"
)
raise ValueError(_msg)
RAPIDS initialization suppression from `nemo_curator/__init__.py:32`:
os.environ["RAPIDS_NO_INITIALIZE"] = "1"
Ray API limit configuration from `nemo_curator/__init__.py:34-38`:
from cosmos_xenna.ray_utils.cluster import API_LIMIT
os.environ["RAY_MAX_LIMIT_FROM_API_SERVER"] = str(API_LIMIT)
os.environ["RAY_MAX_LIMIT_FROM_DATA_SOURCE"] = str(API_LIMIT)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ValueError: NeMo-Curator currently only supports Linux systems` | Running on macOS or Windows | Use Linux or WSL2 on Windows |
| `ModuleNotFoundError: No module named 'cosmos_xenna'` | Missing cosmos-xenna dependency | `pip install cosmos-xenna==0.1.2` |
| `ImportError: ray` | Ray not installed | `pip install ray[default,data]>=2.50` |
Compatibility Notes
- macOS: Not supported. NeMo Curator raises `ValueError` at import time on non-Linux platforms.
- Windows: Not supported directly. Use WSL2 with a Linux distribution.
- Python 3.13+: Not supported. Upper bound is Python 3.12.
- ARM (aarch64): Core package works, but some optional dependencies (vLLM, flash-attn, PyNvVideoCodec) are x86_64-only.