Environment:NVIDIA NeMo Curator Ray Cluster
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Distributed_Computing |
| Last Updated | 2026-02-14 16:45 GMT |
Overview
Ray distributed computing cluster environment with Xenna executor integration for running NeMo Curator pipelines at scale.
Description
NeMo Curator uses Ray as its primary distributed execution backend. The `RayClient` manages cluster connections, and the Cosmos-Xenna framework (via `XennaExecutor`) provides the default executor for pipeline stages. The cluster can run locally (single-node) or across multiple nodes. NeMo Curator automatically configures Ray environment variables at import time to ensure compatibility with the Xenna executor.
Usage
Required for all pipeline execution in NeMo Curator. Even single-node usage initializes a local Ray cluster. Multi-node deployments require a pre-configured Ray cluster with the head node accessible via `RAY_ADDRESS`.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | Ray on Linux required by NeMo Curator |
| Network | Open ports: 6379, 8265, 8080, 10001-19999 | Ray head, dashboard, metrics, and worker ports |
| Memory | 8GB+ RAM per node | Ray object store uses shared memory |
| Disk | `/tmp/ray` writable | Default Ray temp directory |
Dependencies
Python Packages
- `ray[default,data]` >= 2.50
- `cosmos-xenna` == 0.1.2
Credentials
The following environment variables configure the Ray cluster:
- `RAY_ADDRESS`: Address of the Ray head node (e.g., `ray://head-node:10001`). If set, NeMo Curator connects to an existing cluster instead of starting a local one.
- `CURATOR_IGNORE_RAY_HEAD_NODE`: Optional flag to ignore Ray head node scheduling constraints.
- `RAPIDS_NO_INITIALIZE`: Set to `1` automatically by NeMo Curator to prevent premature RAPIDS initialization.
- `RAY_MAX_LIMIT_FROM_API_SERVER`: Set automatically from Cosmos-Xenna API_LIMIT.
- `RAY_MAX_LIMIT_FROM_DATA_SOURCE`: Set automatically from Cosmos-Xenna API_LIMIT.
Quick Install
# Ray is included in the base nemo-curator install
pip install nemo-curator
# To start a local Ray cluster manually:
ray start --head --port=6379 --dashboard-host=0.0.0.0
Code Evidence
Default port configuration from `nemo_curator/core/constants.py:15-26`:
DEFAULT_RAY_PORT = 6379
DEFAULT_RAY_DASHBOARD_PORT = 8265
DEFAULT_RAY_TEMP_DIR = "/tmp/ray"
DEFAULT_RAY_METRICS_PORT = 8080
DEFAULT_RAY_DASHBOARD_HOST = "127.0.0.1"
DEFAULT_RAY_CLIENT_SERVER_PORT = 10001
DEFAULT_RAY_AUTOSCALER_METRIC_PORT = 44217
DEFAULT_RAY_DASHBOARD_METRIC_PORT = 44227
# We cannot use a free port between 10000 and 19999 as it is used by Ray.
DEFAULT_RAY_MIN_WORKER_PORT = 10002
DEFAULT_RAY_MAX_WORKER_PORT = 19999
RAY_ADDRESS detection from `nemo_curator/core/client.py:119`:
# Check if Ray is already running via RAY_ADDRESS env var
ray_address = os.environ.get("RAY_ADDRESS")
Automatic env var configuration from `nemo_curator/__init__.py:34-38`:
from cosmos_xenna.ray_utils.cluster import API_LIMIT
os.environ["RAY_MAX_LIMIT_FROM_API_SERVER"] = str(API_LIMIT)
os.environ["RAY_MAX_LIMIT_FROM_DATA_SOURCE"] = str(API_LIMIT)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ConnectionRefusedError` on Ray client connect | Ray head node not running | Start Ray with `ray start --head` or set `RAY_ADDRESS` |
| `RAY_ADDRESS already set in environment` (warning) | Conflicting Ray address configuration | Clear `RAY_ADDRESS` or ensure it points to the correct cluster |
| Port conflicts on 6379 | Another service using the Ray default port | Change Ray port with `--port` flag |
| Ray object store OOM | Insufficient shared memory | Increase `/dev/shm` size or set `--object-store-memory` |
Compatibility Notes
- Port range 10000-19999: Reserved by Ray for worker communication. Do not bind other services to these ports.
- Single-node mode: NeMo Curator auto-starts a local Ray cluster if no `RAY_ADDRESS` is set.
- Multi-node: All nodes must have the same NeMo Curator version and compatible RAPIDS/CUDA stack.
- Dashboard: Accessible at `http://localhost:8265` by default. Change host with `DEFAULT_RAY_DASHBOARD_HOST`.