Environment:NVIDIA NeMo Curator NVIDIA DALI

Knowledge Sources	NVIDIA NeMo Curator NVIDIA DALI
Domains	Infrastructure, Image_Processing, GPU_Computing
Last Updated	2026-02-14 16:45 GMT

Overview

NVIDIA DALI (Data Loading Library) with CUDA 12 for GPU-accelerated image decoding and preprocessing in the image curation pipeline.

Description

NVIDIA DALI provides hardware-accelerated data loading and augmentation. In NeMo Curator, it is used by the `ImageReaderStage` to decode images from WebDataset tar shards. DALI supports both GPU decode (via NVJPEG) and CPU decode, automatically selecting based on CUDA availability. When running with GPU, images are decoded directly on the GPU, avoiding CPU-GPU transfer overhead.

Usage

This environment is required for the Image Curation Pipeline. Specifically, the `ImageReaderStage` imports `nvidia.dali` at runtime and raises a `RuntimeError` if it is not installed. DALI is also used by image embedding and filtering stages that depend on the reader.

System Requirements

Category	Requirement	Notes
OS	Linux	Required by DALI
Hardware	NVIDIA GPU (optional)	GPU enables mixed-device decoding; CPU-only DALI also works
CUDA	CUDA 12.x	Required for `nvidia-dali-cuda120` package

Dependencies

Python Packages

`nvidia-dali-cuda120` (from `image_cuda12` optional dependency group)
`torchvision` (from `image_cpu` group, used alongside DALI)

Credentials

No credentials required.

Quick Install

# Install NeMo Curator with image curation support (includes DALI)
pip install "nemo-curator[image_cuda12]"

Code Evidence

DALI import with RuntimeError from `nemo_curator/stages/image/io/image_reader.py:61-68`:

try:
    from nvidia.dali import fn, pipeline_def, types
except ModuleNotFoundError as exc:
    msg = (
        "nvidia.dali is required to use ImageReaderStage. "
        "Install a compatible DALI build (GPU or CPU) for your environment."
    )
    raise RuntimeError(msg) from exc

GPU/CPU decode selection from `nemo_curator/stages/image/io/image_reader.py:82-84`:

# Decode on GPU when available, otherwise on CPU; keep original sizes (no resize)
decode_device = "mixed" if torch.cuda.is_available() else "cpu"
return fn.decoders.image(img_raw, device=decode_device, output_type=types.RGB)

GPU resource allocation from `nemo_curator/stages/image/io/image_reader.py:44-52`:

if torch.cuda.is_available():
    logger.info("ImageReaderStage using DALI GPU decode.")
else:
    logger.info("CUDA not available; ImageReaderStage using DALI CPU decode.")

if torch.cuda.is_available():
    self.resources = Resources(gpus=self.num_gpus_per_worker)
else:
    self.resources = Resources()

Common Errors

Error Message	Cause	Solution
`RuntimeError: nvidia.dali is required to use ImageReaderStage`	DALI not installed	`pip install nvidia-dali-cuda120`
`ModuleNotFoundError: No module named 'nvidia.dali'`	Wrong DALI package for CUDA version	Match DALI package to your CUDA version (cuda120 for CUDA 12)
DALI pipeline errors with GPU	GPU device not accessible	Check `nvidia-smi` and CUDA driver installation

Compatibility Notes

CPU-only mode: DALI supports CPU-only decoding as a fallback. The `ImageReaderStage` automatically detects CUDA availability and adjusts.
CUDA version: Must match your CUDA toolkit version. Use `nvidia-dali-cuda120` for CUDA 12.
WebDataset format: DALI reads images from WebDataset tar shards. Images must be in JPG format.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment