Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA DALI KernelManager

From Leeroopedia


Knowledge Sources
Domains Kernels, GPU_Computing
Last Updated 2026-02-08 16:00 GMT

Overview

Manages multiple type-erased kernel instances, providing a uniform interface to create, configure, and run kernels whose concrete types are selected at runtime.

Description

The KernelManager class provides type erasure for DALI kernels, allowing operators to manage collections of kernel instances without knowing their concrete types at the point of storage. It maintains a SmallVector of AnyKernelInstance slots, where each slot holds a type-erased kernel object (via a unique_ptr<void> with a typed deleter) along with its cached KernelRequirements.

The AnyKernelInstance helper struct is the building block for type erasure. It uses a unique_ptr<void, void(*)(void*)> where the deleter function pointer encodes the concrete kernel type. The create_or_get method checks the deleter to determine if the existing instance matches the requested type, creating a new one only if needed. The get method performs a similar type check and throws std::logic_error on mismatches.

The KernelManager exposes templated Setup and Run methods that forward to the underlying kernel instances. Notably, Run automatically creates a temporary DynamicScratchpad if the context's scratchpad pointer is null, using the context's GPU stream for stream-ordered allocation. This provides a convenient fallback that ensures kernels always have access to temporary memory without requiring callers to manage scratchpad lifetimes explicitly.

Usage

Use KernelManager in DALI operator implementations where different kernel types may be selected at runtime (e.g., based on input data types or backend). Call Resize to set the number of kernel instances (typically one per sample or minibatch), then Initialize or CreateOrGet to populate slots with concrete kernel types. Use Setup to compute requirements for each instance, and Run to execute them. The manager handles the lifecycle of kernel instances and their requirements.

Code Reference

Source Location

Signature

struct AnyKernelInstance {
  KernelRequirements requirements;
  std::unique_ptr<void, void(*)(void*)> instance = { nullptr, free };

  template <typename Kernel, typename... Args>
  Kernel &create_or_get(Args&&... args);

  template <typename Kernel>
  Kernel &get();

  template <typename Kernel>
  static void delete_kernel(void *ptr);

  explicit operator bool() const noexcept;
};

class DLL_PUBLIC KernelManager {
 public:
  void Resize(size_t num_instances);

  template <typename Kernel, typename... Args>
  void Resize(size_t num_instances, const Args&... args);

  template <typename Kernel, typename... Args>
  void Initialize(const Args&... args);

  void Reset();

  template <typename Kernel, typename... ConstructorArgs>
  Kernel &CreateOrGet(int instance_idx, ConstructorArgs &&...args);

  template <typename Kernel>
  Kernel &Get(int instance_idx);

  KernelRequirements &GetRequirements(int instance_idx) noexcept;
  const KernelRequirements &GetRequirements(int instance_idx) const noexcept;

  size_t NumInstances() const noexcept;

  template <typename Kernel, typename... InArgs>
  KernelRequirements &Setup(int instance_idx, KernelContext &context, InArgs &&...in_args);

  template <typename Kernel, typename... OutInArgs>
  void Run(int instance_idx, KernelContext &context, OutInArgs &&...out_in_args);

 private:
  SmallVector<AnyKernelInstance, 1> instances;
};

Import

#include "dali/kernels/kernel_manager.h"

I/O Contract

Inputs

Name Type Required Description
num_instances size_t Yes Number of kernel instance slots to create; typically one per sample or minibatch
instance_idx int Yes Zero-based index of the kernel instance to operate on
context KernelContext& Yes Execution context with CUDA stream and optional scratchpad
in_args InArgs&&... Yes Inputs and arguments forwarded to Kernel::Setup
out_in_args OutInArgs&&... Yes Outputs, inputs, and arguments forwarded to Kernel::Run
args (constructor) Args&&... No Arguments forwarded to the kernel's constructor during creation

Outputs

Name Type Description
(kernel reference) Kernel& Reference to the created or retrieved kernel instance (from CreateOrGet/Get)
requirements KernelRequirements& Reference to cached requirements (from Setup/GetRequirements)
(void) void Run executes the kernel; outputs written through output tensor views

Usage Examples

Managing Per-Sample Kernels

#include "dali/kernels/kernel_manager.h"

KernelManager kmgr;

// Create slots for 32 samples and initialize with MyKernel
kmgr.Resize<MyResizeKernel>(32);

KernelContext ctx;
ctx.gpu.stream = stream;

// Setup each instance
for (int i = 0; i < 32; i++) {
  kmgr.Setup<MyResizeKernel>(i, ctx, input_views[i], params[i]);
}

// Run each instance (scratchpad created automatically if ctx.scratchpad is null)
for (int i = 0; i < 32; i++) {
  kmgr.Run<MyResizeKernel>(i, ctx, output_views[i], input_views[i], params[i]);
}

Dynamic Kernel Type Selection

#include "dali/kernels/kernel_manager.h"

KernelManager kmgr;
kmgr.Resize(num_samples);

for (int i = 0; i < num_samples; i++) {
  if (use_bilinear[i]) {
    kmgr.CreateOrGet<BilinearResizeKernel>(i);
  } else {
    kmgr.CreateOrGet<NearestResizeKernel>(i);
  }
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment