Implementation:NVIDIA DALI KernelManager
| Knowledge Sources | |
|---|---|
| Domains | Kernels, GPU_Computing |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Manages multiple type-erased kernel instances, providing a uniform interface to create, configure, and run kernels whose concrete types are selected at runtime.
Description
The KernelManager class provides type erasure for DALI kernels, allowing operators to manage collections of kernel instances without knowing their concrete types at the point of storage. It maintains a SmallVector of AnyKernelInstance slots, where each slot holds a type-erased kernel object (via a unique_ptr<void> with a typed deleter) along with its cached KernelRequirements.
The AnyKernelInstance helper struct is the building block for type erasure. It uses a unique_ptr<void, void(*)(void*)> where the deleter function pointer encodes the concrete kernel type. The create_or_get method checks the deleter to determine if the existing instance matches the requested type, creating a new one only if needed. The get method performs a similar type check and throws std::logic_error on mismatches.
The KernelManager exposes templated Setup and Run methods that forward to the underlying kernel instances. Notably, Run automatically creates a temporary DynamicScratchpad if the context's scratchpad pointer is null, using the context's GPU stream for stream-ordered allocation. This provides a convenient fallback that ensures kernels always have access to temporary memory without requiring callers to manage scratchpad lifetimes explicitly.
Usage
Use KernelManager in DALI operator implementations where different kernel types may be selected at runtime (e.g., based on input data types or backend). Call Resize to set the number of kernel instances (typically one per sample or minibatch), then Initialize or CreateOrGet to populate slots with concrete kernel types. Use Setup to compute requirements for each instance, and Run to execute them. The manager handles the lifecycle of kernel instances and their requirements.
Code Reference
Source Location
- Repository: NVIDIA_DALI
- File: dali/kernels/kernel_manager.h
- Lines: 1-212
Signature
struct AnyKernelInstance {
KernelRequirements requirements;
std::unique_ptr<void, void(*)(void*)> instance = { nullptr, free };
template <typename Kernel, typename... Args>
Kernel &create_or_get(Args&&... args);
template <typename Kernel>
Kernel &get();
template <typename Kernel>
static void delete_kernel(void *ptr);
explicit operator bool() const noexcept;
};
class DLL_PUBLIC KernelManager {
public:
void Resize(size_t num_instances);
template <typename Kernel, typename... Args>
void Resize(size_t num_instances, const Args&... args);
template <typename Kernel, typename... Args>
void Initialize(const Args&... args);
void Reset();
template <typename Kernel, typename... ConstructorArgs>
Kernel &CreateOrGet(int instance_idx, ConstructorArgs &&...args);
template <typename Kernel>
Kernel &Get(int instance_idx);
KernelRequirements &GetRequirements(int instance_idx) noexcept;
const KernelRequirements &GetRequirements(int instance_idx) const noexcept;
size_t NumInstances() const noexcept;
template <typename Kernel, typename... InArgs>
KernelRequirements &Setup(int instance_idx, KernelContext &context, InArgs &&...in_args);
template <typename Kernel, typename... OutInArgs>
void Run(int instance_idx, KernelContext &context, OutInArgs &&...out_in_args);
private:
SmallVector<AnyKernelInstance, 1> instances;
};
Import
#include "dali/kernels/kernel_manager.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| num_instances | size_t |
Yes | Number of kernel instance slots to create; typically one per sample or minibatch |
| instance_idx | int |
Yes | Zero-based index of the kernel instance to operate on |
| context | KernelContext& |
Yes | Execution context with CUDA stream and optional scratchpad |
| in_args | InArgs&&... |
Yes | Inputs and arguments forwarded to Kernel::Setup
|
| out_in_args | OutInArgs&&... |
Yes | Outputs, inputs, and arguments forwarded to Kernel::Run
|
| args (constructor) | Args&&... |
No | Arguments forwarded to the kernel's constructor during creation |
Outputs
| Name | Type | Description |
|---|---|---|
| (kernel reference) | Kernel& |
Reference to the created or retrieved kernel instance (from CreateOrGet/Get)
|
| requirements | KernelRequirements& |
Reference to cached requirements (from Setup/GetRequirements)
|
| (void) | void |
Run executes the kernel; outputs written through output tensor views
|
Usage Examples
Managing Per-Sample Kernels
#include "dali/kernels/kernel_manager.h"
KernelManager kmgr;
// Create slots for 32 samples and initialize with MyKernel
kmgr.Resize<MyResizeKernel>(32);
KernelContext ctx;
ctx.gpu.stream = stream;
// Setup each instance
for (int i = 0; i < 32; i++) {
kmgr.Setup<MyResizeKernel>(i, ctx, input_views[i], params[i]);
}
// Run each instance (scratchpad created automatically if ctx.scratchpad is null)
for (int i = 0; i < 32; i++) {
kmgr.Run<MyResizeKernel>(i, ctx, output_views[i], input_views[i], params[i]);
}
Dynamic Kernel Type Selection
#include "dali/kernels/kernel_manager.h"
KernelManager kmgr;
kmgr.Resize(num_samples);
for (int i = 0; i < num_samples; i++) {
if (use_bilinear[i]) {
kmgr.CreateOrGet<BilinearResizeKernel>(i);
} else {
kmgr.CreateOrGet<NearestResizeKernel>(i);
}
}