Principle:NVIDIA DALI Custom Operator Class Definition

Knowledge Sources	NVIDIA DALI Documentation NVIDIA DALI
Domains	Custom_Operators, GPU_Computing, Data_Pipeline
Last Updated	2026-02-08 00:00 GMT

Overview

A custom DALI operator is defined by creating a C++ class template that inherits from dali::Operator<Backend> and overrides the SetupImpl() and RunImpl() virtual methods to declare output shapes and execute per-batch computation.

Description

Custom operator class definition is the foundational pattern for extending NVIDIA DALI's data processing pipeline with user-defined operations. DALI provides a base class template, dali::Operator<Backend>, parameterized on the compute backend (CPUBackend or GPUBackend). To create a new operator, the developer defines a class template that:

Inherits from dali::Operator<Backend>.
Accepts a const dali::OpSpec &spec in its constructor, forwarding it to the base class and extracting any operator-specific arguments using spec.GetArgument<T>("arg_name").
Overrides SetupImpl() to declare the shape and data type of each output tensor. This method receives a mutable vector of OutputDesc and a const reference to the Workspace. By populating output_desc and returning true, the operator instructs DALI to pre-allocate output buffers before RunImpl() is called.
Overrides RunImpl() to perform the actual computation, reading input tensors from the workspace and writing results to pre-allocated output tensors.

The class template is typically specialized for specific backends (e.g., GPUBackend) in separate .cu files to contain CUDA-specific code.

Usage

Use this pattern whenever you need to implement a custom data augmentation, transformation, or preprocessing step that is not available in DALI's built-in operator library. This is especially valuable when:

The operation requires GPU acceleration via CUDA kernels.
You need tight integration with DALI's batched execution model and memory management.
The operation must be composable with other DALI operators in a pipeline graph.

Theoretical Basis

The DALI operator class hierarchy follows the Template Method design pattern. The base class OperatorBase defines the public Setup() and Run() methods that handle batch-size enforcement, layout checking, and thread-pool synchronization. These methods delegate to the protected virtual SetupImpl() and RunImpl() methods that derived operators must implement. This separation ensures that cross-cutting concerns (uniform batch size enforcement, output contiguity) are handled consistently, while individual operators focus solely on their computational logic.

The Backend template parameter enables compile-time dispatch between CPU and GPU execution paths. A single operator class template can be specialized for different backends, with each specialization linked against the appropriate runtime (e.g., CUDA for GPUBackend). This approach avoids virtual dispatch overhead in the hot path while preserving a uniform API surface for pipeline construction.

The OpSpec parameter object implements the Specification pattern, carrying all operator configuration (arguments, input/output counts, backend type) in a single, immutable structure. This allows lazy argument resolution, including support for per-sample tensor arguments that can vary across a batch.

Related Pages

Implemented By

Implementation:NVIDIA_DALI_Operator_Base_Class

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment