Environment:NVIDIA NeMo Curator NVIDIA DALI
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Image_Processing, GPU_Computing |
| Last Updated | 2026-02-14 16:45 GMT |
Overview
NVIDIA DALI (Data Loading Library) with CUDA 12 for GPU-accelerated image decoding and preprocessing in the image curation pipeline.
Description
NVIDIA DALI provides hardware-accelerated data loading and augmentation. In NeMo Curator, it is used by the `ImageReaderStage` to decode images from WebDataset tar shards. DALI supports both GPU decode (via NVJPEG) and CPU decode, automatically selecting based on CUDA availability. When running with GPU, images are decoded directly on the GPU, avoiding CPU-GPU transfer overhead.
Usage
This environment is required for the Image Curation Pipeline. Specifically, the `ImageReaderStage` imports `nvidia.dali` at runtime and raises a `RuntimeError` if it is not installed. DALI is also used by image embedding and filtering stages that depend on the reader.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | Required by DALI |
| Hardware | NVIDIA GPU (optional) | GPU enables mixed-device decoding; CPU-only DALI also works |
| CUDA | CUDA 12.x | Required for `nvidia-dali-cuda120` package |
Dependencies
Python Packages
- `nvidia-dali-cuda120` (from `image_cuda12` optional dependency group)
- `torchvision` (from `image_cpu` group, used alongside DALI)
Credentials
No credentials required.
Quick Install
# Install NeMo Curator with image curation support (includes DALI)
pip install "nemo-curator[image_cuda12]"
Code Evidence
DALI import with RuntimeError from `nemo_curator/stages/image/io/image_reader.py:61-68`:
try:
from nvidia.dali import fn, pipeline_def, types
except ModuleNotFoundError as exc:
msg = (
"nvidia.dali is required to use ImageReaderStage. "
"Install a compatible DALI build (GPU or CPU) for your environment."
)
raise RuntimeError(msg) from exc
GPU/CPU decode selection from `nemo_curator/stages/image/io/image_reader.py:82-84`:
# Decode on GPU when available, otherwise on CPU; keep original sizes (no resize)
decode_device = "mixed" if torch.cuda.is_available() else "cpu"
return fn.decoders.image(img_raw, device=decode_device, output_type=types.RGB)
GPU resource allocation from `nemo_curator/stages/image/io/image_reader.py:44-52`:
if torch.cuda.is_available():
logger.info("ImageReaderStage using DALI GPU decode.")
else:
logger.info("CUDA not available; ImageReaderStage using DALI CPU decode.")
if torch.cuda.is_available():
self.resources = Resources(gpus=self.num_gpus_per_worker)
else:
self.resources = Resources()
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `RuntimeError: nvidia.dali is required to use ImageReaderStage` | DALI not installed | `pip install nvidia-dali-cuda120` |
| `ModuleNotFoundError: No module named 'nvidia.dali'` | Wrong DALI package for CUDA version | Match DALI package to your CUDA version (cuda120 for CUDA 12) |
| DALI pipeline errors with GPU | GPU device not accessible | Check `nvidia-smi` and CUDA driver installation |
Compatibility Notes
- CPU-only mode: DALI supports CPU-only decoding as a fallback. The `ImageReaderStage` automatically detects CUDA availability and adjusts.
- CUDA version: Must match your CUDA toolkit version. Use `nvidia-dali-cuda120` for CUDA 12.
- WebDataset format: DALI reads images from WebDataset tar shards. Images must be in JPG format.