Implementation:NVIDIA DALI NVML Utilities

Knowledge Sources	NVIDIA_DALI
Domains	Utilities, GPU_Management
Last Updated	2026-02-08 16:00 GMT

Overview

Implements NVML-based GPU management functions including driver version queries, CUDA device-to-NVML handle resolution, CPU affinity mask retrieval, and CPU affinity setting.

Description

The NVML utilities implementation in dali/util/nvml.cc provides the concrete implementations for the GPU management functions declared in dali/util/nvml.h. The module is organized into three main functional areas: driver version queries, device handle resolution, and CPU affinity management.

The impl::GetDriverVersion and impl::GetCudaDriverVersion functions query the NVIDIA driver version string and CUDA driver version integer respectively via NVML system calls. Both functions check whether NVML is initialized before attempting the query, returning 0 if it is not. These values are cached at first call via static local variables in the inline wrappers declared in the header.

The nvmlGetDeviceHandleForCUDA function resolves a CUDA device index to an NVML device handle by converting the CUDA device UUID to a string and performing a UUID-based lookup. It handles MIG (Multi-Instance GPU) instances by retrying with a "MIG" prefix when the initial "GPU-" prefixed UUID lookup fails. The internal uuid_str helper formats the 16-byte UUID into the standard dash-separated hexadecimal representation.

The CPU affinity functions use the NVML device affinity API to determine which CPU cores have optimal locality to a given GPU. GetNVMLAffinityMask retrieves the NVML-recommended CPU mask for the current CUDA device and intersects it with the thread's existing affinity mask. On CUDA 11+, it uses nvmlDeviceGetCpuAffinityWithinScope with socket scope; otherwise it falls back to nvmlDeviceGetCpuAffinity. SetCPUAffinity either pins the calling thread to a specific core or applies the NVML-recommended affinity, with validation and warning messages for invalid or empty masks.

Usage

This implementation is used internally by DALI to optimize CPU-GPU data transfer performance by setting appropriate CPU affinity for worker threads. The driver version queries support compatibility checks and feature gating. The device handle resolution enables all NVML device-specific queries by bridging the CUDA and NVML device identifier spaces.

Code Reference

Source Location

Repository: NVIDIA_DALI
File: dali/util/nvml.cc
Lines: 1-168

Signature

namespace dali {
namespace nvml {
namespace impl {

float GetDriverVersion();
int GetCudaDriverVersion();

}  // namespace impl

nvmlDevice_t nvmlGetDeviceHandleForCUDA(int cuda_idx);

void GetNVMLAffinityMask(cpu_set_t *mask, size_t num_cpus);

void SetCPUAffinity(int core = -1);

}  // namespace nvml
}  // namespace dali

Import

#include "dali/util/nvml.h"

I/O Contract

Inputs

Name	Type	Required	Description
cuda_idx	`int`	Yes (nvmlGetDeviceHandleForCUDA)	CUDA device index to resolve to an NVML handle
mask	`cpu_set_t*`	Yes (GetNVMLAffinityMask)	Output CPU set to receive the intersection of NVML-recommended and current affinity
num_cpus	`size_t`	Yes (GetNVMLAffinityMask)	Total number of CPUs configured in the system
core	`int`	No (SetCPUAffinity)	Specific CPU core to pin to; -1 (default) uses NVML-recommended affinity

Outputs

Name	Type	Description
return value (GetDriverVersion)	`float`	NVIDIA driver version as a float (0 if NVML not initialized)
return value (GetCudaDriverVersion)	`int`	CUDA driver version integer (0 if NVML not initialized)
return value (nvmlGetDeviceHandleForCUDA)	`nvmlDevice_t`	NVML device handle for the specified CUDA device
mask (GetNVMLAffinityMask)	`cpu_set_t`	Populated CPU affinity bitmask (intersection of NVML recommendation and current thread affinity)

Usage Examples

Getting Driver Version Information

#include "dali/util/nvml.h"

dali::nvml::Init();

float driver_version = dali::nvml::impl::GetDriverVersion();
int cuda_driver_version = dali::nvml::impl::GetCudaDriverVersion();

// driver_version is e.g. 535.104
// cuda_driver_version is e.g. 12020

dali::nvml::Shutdown();

Setting CPU Affinity for GPU Locality

#include "dali/util/nvml.h"

dali::nvml::Init();

// Use NVML-recommended CPU affinity for the current CUDA device
dali::nvml::SetCPUAffinity();

// Or pin to a specific CPU core
dali::nvml::SetCPUAffinity(8);

dali::nvml::Shutdown();

Resolving CUDA Device to NVML Handle

#include "dali/util/nvml.h"

dali::nvml::Init();

// Get NVML handle for CUDA device 0 (supports MIG instances)
nvmlDevice_t device = dali::nvml::nvmlGetDeviceHandleForCUDA(0);

dali::nvml::Shutdown();

Related Pages

Environment:NVIDIA_DALI_CUDA_GPU_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment