Workflow:NVIDIA DALI Image Preprocessing Pipeline

Knowledge Sources	NVIDIA DALI DALI Documentation DALI Getting Started
Domains	Data_Loading, Image_Processing, GPU_Computing
Last Updated	2026-02-08 17:00 GMT

Overview

End-to-end process for building a GPU-accelerated image decoding and transformation pipeline using NVIDIA DALI, either as a standalone processing tool or integrated with deep learning frameworks.

Description

This workflow defines the foundational pattern for using DALI as a high-performance image preprocessing engine. It covers both the Pipeline API (graph-based, optimized for throughput in production training) and the Dynamic API (imperative, flexible for experimentation). The pipeline reads encoded image data from files or external sources, decodes them using GPU-accelerated hardware decoders (nvJPEG, nvImageCodec), applies a chain of image transformations (resize, crop, color adjustment, normalization), and outputs processed tensors that can be consumed by any downstream application or deep learning framework.

Usage

Execute this workflow when you need to build a fast image preprocessing pipeline for any purpose: training data preparation, inference preprocessing, batch image processing, or data exploration. This is the starting point for any DALI usage and serves as the foundation for more specialized workflows (classification, detection, segmentation).

Execution Steps

Step 1: Choose the Execution Mode

Select between Pipeline API mode and Dynamic API mode based on your requirements. Pipeline mode defines a static computational graph that is optimized for maximum throughput with prefetching and parallel execution. Dynamic mode calls operators directly in an imperative style for flexibility and rapid iteration.

Key considerations:

Pipeline mode uses @pipeline_def to declare a processing graph, then builds and runs it
Dynamic mode uses nvidia.dali.experimental.dynamic (ndd) to call operators directly
Pipeline mode is recommended for production training workloads
Dynamic mode is recommended for data exploration, debugging, and experimentation
Both modes use the same underlying GPU-accelerated operators

Step 2: Configure the Data Source

Set up the data reader or external source that feeds encoded image data into the pipeline. DALI supports multiple input methods: file system readers, external sources for programmatic feeding, and format-specific readers for TFRecord, LMDB, RecordIO, COCO, WebDataset, and NumPy files.

Key considerations:

fn.readers.file reads images from a directory tree, returning encoded bytes and labels
fn.external_source accepts data provided by the user at each iteration from Python
Format-specific readers (fn.readers.tfrecord, fn.readers.coco, etc.) handle structured datasets
Configure prefetch queue depth for optimal overlap between I/O and compute

Step 3: Decode Images on GPU

Decode encoded image bytes (JPEG, PNG, TIFF, BMP, JPEG2000, WebP) into pixel tensors using DALI's hardware-accelerated decoders. The mixed device mode reads encoded data from CPU memory and decodes on GPU using NVDEC or nvJPEG hardware.

Key considerations:

Use fn.decoders.image for standard decoding to full resolution
Use fn.decoders.image_random_crop for fused decode-and-crop (more efficient than separate operations)
Set device="mixed" for GPU-accelerated decoding
Specify output_type (RGB, BGR, GRAY) as needed
Hardware JPEG decoding provides the highest throughput for JPEG inputs

Step 4: Apply Image Transformations

Chain together image processing operators to transform the decoded images. Common operations include resizing, cropping, color space conversion, brightness/contrast adjustment, rotation, and geometric warping. Each operator runs on GPU for maximum throughput.

Key considerations:

fn.resize scales images to target dimensions with configurable interpolation
fn.crop extracts a region of interest; fn.crop_mirror_normalize fuses crop, flip, and normalize
fn.color_twist adjusts brightness, contrast, saturation, and hue
fn.rotate applies rotation with configurable fill values
fn.warp_affine applies arbitrary affine transformations
Operators can be chained; DALI optimizes the graph to minimize memory allocations

Step 5: Output Processed Data

Retrieve processed image tensors from the pipeline. In Pipeline mode, call pipeline.run() or use a framework iterator. In Dynamic mode, results are returned directly as DALI tensor objects that can be converted to framework-native tensors.

Key considerations:

In Pipeline mode, use DALIGenericIterator (PyTorch), DALIDataset (TensorFlow), or direct pipeline.run()
In Dynamic mode, use torch.as_tensor() or equivalent to convert DALI tensors
Results reside on GPU by default; use .as_cpu() if CPU tensors are needed
The pipeline handles batching, prefetching, and memory management automatically

Execution Diagram

GitHub URL

Workflow Repository