Principle:NVIDIA DALI Custom Operator Build System
| Knowledge Sources | |
|---|---|
| Domains | Custom_Operators, Build_Systems, CMake |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
The custom operator build system compiles C++ and CUDA source files into a shared library (.so) plugin by using CMake to discover DALI's include paths and library directory from the installed nvidia.dali Python package, then linking against libdali.so.
Description
Custom operator build system refers to the CMake-based build configuration that compiles a DALI plugin from C++ and CUDA sources into a dynamically loadable shared library. The build process must solve several challenges:
- DALI Dependency Discovery: The build system must locate the installed DALI package's header files and shared library. This is accomplished by invoking Python at CMake configure time via execute_process() to call nvidia.dali.sysconfig.get_lib_dir() and nvidia.dali.sysconfig.get_compile_flags(). These functions return the absolute path to DALI's library directory and the necessary compiler flags (include paths, preprocessor definitions).
- Language Standards: DALI requires C++20 and CUDA 20 standards. The CMake configuration sets CMAKE_CXX_STANDARD 20 and CMAKE_CUDA_STANDARD 20 with REQUIRED enforcement to ensure compiler compatibility.
- CUDA Architecture Targeting: The CMAKE_CUDA_ARCHITECTURES variable specifies which GPU compute capabilities to target (e.g., "80;90" for Ampere and Hopper). This determines the PTX and SASS code generated by nvcc.
- Shared Library Output: The add_library(name SHARED sources) command produces a position-independent shared object. The target_link_libraries(name dali) command links against libdali.so found in the DALI library directory, resolving symbols from the DALI operator framework.
- Multi-Language Support: The CMake project declares LANGUAGES CUDA CXX C to enable compilation of .cu, .cc, and .c files within the same build.
Usage
Use this build pattern for any custom DALI operator that includes CUDA kernels. The CMakeLists.txt file serves as the template: copy it, update the source file list and library name, and build with standard CMake commands (cmake -B build && cmake --build build).
Theoretical Basis
The build system follows the External Package Discovery pattern, where a project locates its dependencies by querying the dependency's own configuration utilities rather than relying on system-wide package managers. By using execute_process(COMMAND python -c "import nvidia.dali ...") at configure time, the build adapts to any DALI installation location (system-wide, virtualenv, conda environment, or user site-packages).
The decision to produce a shared library rather than a static library is dictated by the plugin architecture. DALI operators self-register via static initializers that populate global registries. When a shared library is loaded with dlopen(), these static initializers execute automatically, registering the operator schema and factory entries. A static library would require explicit initialization calls and would need to be linked at DALI's own build time, defeating the purpose of a plugin.
The CMAKE_CUDA_ARCHITECTURES setting reflects the fat binary compilation model where nvcc generates code for multiple GPU architectures in a single binary. This ensures the plugin runs on different GPU generations without requiring separate builds, at the cost of increased binary size and compilation time.