Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:NVIDIA NeMo Curator Python Linux Base

From Leeroopedia
Knowledge Sources
Domains Infrastructure, Data_Curation
Last Updated 2026-02-14 16:45 GMT

Overview

Linux-only Python 3.10-3.12 environment with core dependencies for running NeMo Curator data curation pipelines.

Description

NeMo Curator requires Linux and will raise a `ValueError` at import time on any other platform. The core runtime depends on Python 3.10 through 3.12, Ray for distributed execution, PyTorch, HuggingFace Transformers, and the Cosmos-Xenna framework. This is the minimum environment needed to import `nemo_curator` and run CPU-based pipeline stages.

Usage

This environment is the mandatory base for all NeMo Curator operations. Every workflow (text, image, video, audio curation) requires this environment. GPU-accelerated features require additional environments layered on top of this base.

System Requirements

Category Requirement Notes
OS Linux only `sys.platform == "linux"` enforced at import
Python >= 3.10, < 3.13 Python 3.10, 3.11, 3.12 supported
Architecture x86_64 recommended Some optional deps (vLLM, PyNvVideoCodec) are x86_64-only
Disk 10GB+ SSD For package installation and temp files

Dependencies

System Packages

  • `setuptools` >= 61.0 (build system)

Python Packages (Core)

  • `absl-py` >= 2.0.0, < 3.0.0
  • `comment_parser`
  • `cosmos-xenna` == 0.1.2
  • `fsspec`
  • `hydra-core`
  • `jieba` == 0.42.1
  • `loguru`
  • `mecab-python3`
  • `omegaconf`
  • `openai` >= 1.0.0
  • `pandas` >= 2.1.0
  • `pyarrow`
  • `ray[default,data]` >= 2.50
  • `torch`
  • `transformers`

Credentials

No credentials are required for the base environment. Optional API keys for specific features:

  • `NVIDIA_API_KEY`: Required for NVIDIA NIM-based synthetic data generation and benchmarking.
  • `HF_TOKEN`: Required for accessing gated HuggingFace models (e.g., AEGIS safety classifier).

Quick Install

# Install base NeMo Curator
pip install nemo-curator

Code Evidence

Linux-only enforcement from `nemo_curator/__init__.py:41-48`:

if sys.platform != "linux":
    _msg = (
        "NeMo-Curator currently only supports Linux systems, "
        f"while the current machine has a {sys.platform} system. \n"
        "For more information on installation and system requirements, see "
        "https://docs.nvidia.com/nemo/curator/latest/admin/installation.html"
    )
    raise ValueError(_msg)

RAPIDS initialization suppression from `nemo_curator/__init__.py:32`:

os.environ["RAPIDS_NO_INITIALIZE"] = "1"

Ray API limit configuration from `nemo_curator/__init__.py:34-38`:

from cosmos_xenna.ray_utils.cluster import API_LIMIT
os.environ["RAY_MAX_LIMIT_FROM_API_SERVER"] = str(API_LIMIT)
os.environ["RAY_MAX_LIMIT_FROM_DATA_SOURCE"] = str(API_LIMIT)

Common Errors

Error Message Cause Solution
`ValueError: NeMo-Curator currently only supports Linux systems` Running on macOS or Windows Use Linux or WSL2 on Windows
`ModuleNotFoundError: No module named 'cosmos_xenna'` Missing cosmos-xenna dependency `pip install cosmos-xenna==0.1.2`
`ImportError: ray` Ray not installed `pip install ray[default,data]>=2.50`

Compatibility Notes

  • macOS: Not supported. NeMo Curator raises `ValueError` at import time on non-Linux platforms.
  • Windows: Not supported directly. Use WSL2 with a Linux distribution.
  • Python 3.13+: Not supported. Upper bound is Python 3.12.
  • ARM (aarch64): Core package works, but some optional dependencies (vLLM, flash-attn, PyNvVideoCodec) are x86_64-only.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment