Implementation:NVIDIA DALI KernelManager

Knowledge Sources	NVIDIA_DALI
Domains	Kernels, GPU_Computing
Last Updated	2026-02-08 16:00 GMT

Overview

Manages multiple type-erased kernel instances, providing a uniform interface to create, configure, and run kernels whose concrete types are selected at runtime.

Description

The KernelManager class provides type erasure for DALI kernels, allowing operators to manage collections of kernel instances without knowing their concrete types at the point of storage. It maintains a SmallVector of AnyKernelInstance slots, where each slot holds a type-erased kernel object (via a unique_ptr<void> with a typed deleter) along with its cached KernelRequirements.

The AnyKernelInstance helper struct is the building block for type erasure. It uses a unique_ptr<void, void(*)(void*)> where the deleter function pointer encodes the concrete kernel type. The create_or_get method checks the deleter to determine if the existing instance matches the requested type, creating a new one only if needed. The get method performs a similar type check and throws std::logic_error on mismatches.

The KernelManager exposes templated Setup and Run methods that forward to the underlying kernel instances. Notably, Run automatically creates a temporary DynamicScratchpad if the context's scratchpad pointer is null, using the context's GPU stream for stream-ordered allocation. This provides a convenient fallback that ensures kernels always have access to temporary memory without requiring callers to manage scratchpad lifetimes explicitly.

Usage

Use KernelManager in DALI operator implementations where different kernel types may be selected at runtime (e.g., based on input data types or backend). Call Resize to set the number of kernel instances (typically one per sample or minibatch), then Initialize or CreateOrGet to populate slots with concrete kernel types. Use Setup to compute requirements for each instance, and Run to execute them. The manager handles the lifecycle of kernel instances and their requirements.

Code Reference

Source Location

Repository: NVIDIA_DALI
File: dali/kernels/kernel_manager.h
Lines: 1-212

Signature

struct AnyKernelInstance {
  KernelRequirements requirements;
  std::unique_ptr<void, void(*)(void*)> instance = { nullptr, free };

  template <typename Kernel, typename... Args>
  Kernel &create_or_get(Args&&... args);

  template <typename Kernel>
  Kernel &get();

  template <typename Kernel>
  static void delete_kernel(void *ptr);

  explicit operator bool() const noexcept;
};

class DLL_PUBLIC KernelManager {
 public:
  void Resize(size_t num_instances);

  template <typename Kernel, typename... Args>
  void Resize(size_t num_instances, const Args&... args);

  template <typename Kernel, typename... Args>
  void Initialize(const Args&... args);

  void Reset();

  template <typename Kernel, typename... ConstructorArgs>
  Kernel &CreateOrGet(int instance_idx, ConstructorArgs &&...args);

  template <typename Kernel>
  Kernel &Get(int instance_idx);

  KernelRequirements &GetRequirements(int instance_idx) noexcept;
  const KernelRequirements &GetRequirements(int instance_idx) const noexcept;

  size_t NumInstances() const noexcept;

  template <typename Kernel, typename... InArgs>
  KernelRequirements &Setup(int instance_idx, KernelContext &context, InArgs &&...in_args);

  template <typename Kernel, typename... OutInArgs>
  void Run(int instance_idx, KernelContext &context, OutInArgs &&...out_in_args);

 private:
  SmallVector<AnyKernelInstance, 1> instances;
};

Import

#include "dali/kernels/kernel_manager.h"

I/O Contract

Inputs

Name	Type	Required	Description
num_instances	`size_t`	Yes	Number of kernel instance slots to create; typically one per sample or minibatch
instance_idx	`int`	Yes	Zero-based index of the kernel instance to operate on
context	`KernelContext&`	Yes	Execution context with CUDA stream and optional scratchpad
in_args	`InArgs&&...`	Yes	Inputs and arguments forwarded to `Kernel::Setup`
out_in_args	`OutInArgs&&...`	Yes	Outputs, inputs, and arguments forwarded to `Kernel::Run`
args (constructor)	`Args&&...`	No	Arguments forwarded to the kernel's constructor during creation

Outputs

Name	Type	Description
(kernel reference)	`Kernel&`	Reference to the created or retrieved kernel instance (from `CreateOrGet`/`Get`)
requirements	`KernelRequirements&`	Reference to cached requirements (from `Setup`/`GetRequirements`)
(void)	`void`	`Run` executes the kernel; outputs written through output tensor views

Usage Examples

Managing Per-Sample Kernels

#include "dali/kernels/kernel_manager.h"

KernelManager kmgr;

// Create slots for 32 samples and initialize with MyKernel
kmgr.Resize<MyResizeKernel>(32);

KernelContext ctx;
ctx.gpu.stream = stream;

// Setup each instance
for (int i = 0; i < 32; i++) {
  kmgr.Setup<MyResizeKernel>(i, ctx, input_views[i], params[i]);
}

// Run each instance (scratchpad created automatically if ctx.scratchpad is null)
for (int i = 0; i < 32; i++) {
  kmgr.Run<MyResizeKernel>(i, ctx, output_views[i], input_views[i], params[i]);
}

Dynamic Kernel Type Selection

#include "dali/kernels/kernel_manager.h"

KernelManager kmgr;
kmgr.Resize(num_samples);

for (int i = 0; i < num_samples; i++) {
  if (use_bilinear[i]) {
    kmgr.CreateOrGet<BilinearResizeKernel>(i);
  } else {
    kmgr.CreateOrGet<NearestResizeKernel>(i);
  }
}

Related Pages

Environment:NVIDIA_DALI_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment