Principle:NVIDIA DALI Execution Mode Selection
| Knowledge Sources | |
|---|---|
| Domains | Image_Processing, GPU_Computing, Pipeline_Architecture |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Execution mode selection determines how a DALI pipeline schedules and overlaps its data loading, preprocessing, and GPU compute stages, directly governing throughput, latency, and resource utilization.
Description
Execution mode selection is the architectural decision that controls the internal scheduling strategy of an NVIDIA DALI pipeline. DALI offers multiple execution models, each representing a different trade-off between programming simplicity, throughput, and control:
- Dynamic execution (exec_dynamic=True) is the modern default. In this mode the pipeline uses an internal task scheduler that automatically overlaps CPU and GPU work across pipeline stages. The caller invokes pipe.run() and the framework handles all prefetching, buffering, and synchronization internally. Dynamic mode also enables advanced features such as asynchronous outputs and conditional operators.
- Pipelined execution (exec_pipelined=True, exec_async=True) is the legacy default. The pipeline explicitly double- or triple-buffers between stages, with the depth controlled by prefetch_queue_depth. This mode provides deterministic memory usage at the cost of slightly more complex reasoning about buffer lifetimes.
- Simple (synchronous) execution (exec_pipelined=False, exec_async=False) runs each stage sequentially with no overlap. This is easiest to debug but yields the lowest throughput.
The @pipeline_def decorator captures these choices as keyword arguments, creating a factory function that instantiates a Pipeline object with the selected execution model. Parameters specified in the decorator serve as defaults and can be overridden at call time.
Usage
Use dynamic execution mode (the default in recent DALI versions) for production workloads where maximum GPU utilization is required. Use pipelined mode when you need precise control over prefetch depth or when integrating with legacy code that relies on explicit buffer management. Use simple execution only during debugging or when deterministic single-step execution is needed.
Theoretical Basis
Pipeline execution modes are rooted in the producer-consumer pattern from concurrent systems design. By decoupling data production (reading and decoding) from consumption (GPU augmentation and model inference), the pipeline can hide I/O and compute latency through overlapping. Dynamic scheduling generalizes this by allowing the runtime to decide buffer depths and overlap strategies, similar to out-of-order execution in modern CPU architectures. The key theoretical metric is pipeline bubble time -- the fraction of wall-clock time during which a pipeline stage is idle waiting for upstream data or downstream buffer space.