Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:NVIDIA NeMo Curator NVIDIA DALI

From Leeroopedia
Knowledge Sources
Domains Infrastructure, Image_Processing, GPU_Computing
Last Updated 2026-02-14 16:45 GMT

Overview

NVIDIA DALI (Data Loading Library) with CUDA 12 for GPU-accelerated image decoding and preprocessing in the image curation pipeline.

Description

NVIDIA DALI provides hardware-accelerated data loading and augmentation. In NeMo Curator, it is used by the `ImageReaderStage` to decode images from WebDataset tar shards. DALI supports both GPU decode (via NVJPEG) and CPU decode, automatically selecting based on CUDA availability. When running with GPU, images are decoded directly on the GPU, avoiding CPU-GPU transfer overhead.

Usage

This environment is required for the Image Curation Pipeline. Specifically, the `ImageReaderStage` imports `nvidia.dali` at runtime and raises a `RuntimeError` if it is not installed. DALI is also used by image embedding and filtering stages that depend on the reader.

System Requirements

Category Requirement Notes
OS Linux Required by DALI
Hardware NVIDIA GPU (optional) GPU enables mixed-device decoding; CPU-only DALI also works
CUDA CUDA 12.x Required for `nvidia-dali-cuda120` package

Dependencies

Python Packages

  • `nvidia-dali-cuda120` (from `image_cuda12` optional dependency group)
  • `torchvision` (from `image_cpu` group, used alongside DALI)

Credentials

No credentials required.

Quick Install

# Install NeMo Curator with image curation support (includes DALI)
pip install "nemo-curator[image_cuda12]"

Code Evidence

DALI import with RuntimeError from `nemo_curator/stages/image/io/image_reader.py:61-68`:

try:
    from nvidia.dali import fn, pipeline_def, types
except ModuleNotFoundError as exc:
    msg = (
        "nvidia.dali is required to use ImageReaderStage. "
        "Install a compatible DALI build (GPU or CPU) for your environment."
    )
    raise RuntimeError(msg) from exc

GPU/CPU decode selection from `nemo_curator/stages/image/io/image_reader.py:82-84`:

# Decode on GPU when available, otherwise on CPU; keep original sizes (no resize)
decode_device = "mixed" if torch.cuda.is_available() else "cpu"
return fn.decoders.image(img_raw, device=decode_device, output_type=types.RGB)

GPU resource allocation from `nemo_curator/stages/image/io/image_reader.py:44-52`:

if torch.cuda.is_available():
    logger.info("ImageReaderStage using DALI GPU decode.")
else:
    logger.info("CUDA not available; ImageReaderStage using DALI CPU decode.")

if torch.cuda.is_available():
    self.resources = Resources(gpus=self.num_gpus_per_worker)
else:
    self.resources = Resources()

Common Errors

Error Message Cause Solution
`RuntimeError: nvidia.dali is required to use ImageReaderStage` DALI not installed `pip install nvidia-dali-cuda120`
`ModuleNotFoundError: No module named 'nvidia.dali'` Wrong DALI package for CUDA version Match DALI package to your CUDA version (cuda120 for CUDA 12)
DALI pipeline errors with GPU GPU device not accessible Check `nvidia-smi` and CUDA driver installation

Compatibility Notes

  • CPU-only mode: DALI supports CPU-only decoding as a fallback. The `ImageReaderStage` automatically detects CUDA availability and adjusts.
  • CUDA version: Must match your CUDA toolkit version. Use `nvidia-dali-cuda120` for CUDA 12.
  • WebDataset format: DALI reads images from WebDataset tar shards. Images must be in JPG format.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment