Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA DALI NVML Utilities

From Leeroopedia


Knowledge Sources
Domains Utilities, GPU_Management
Last Updated 2026-02-08 16:00 GMT

Overview

Implements NVML-based GPU management functions including driver version queries, CUDA device-to-NVML handle resolution, CPU affinity mask retrieval, and CPU affinity setting.

Description

The NVML utilities implementation in dali/util/nvml.cc provides the concrete implementations for the GPU management functions declared in dali/util/nvml.h. The module is organized into three main functional areas: driver version queries, device handle resolution, and CPU affinity management.

The impl::GetDriverVersion and impl::GetCudaDriverVersion functions query the NVIDIA driver version string and CUDA driver version integer respectively via NVML system calls. Both functions check whether NVML is initialized before attempting the query, returning 0 if it is not. These values are cached at first call via static local variables in the inline wrappers declared in the header.

The nvmlGetDeviceHandleForCUDA function resolves a CUDA device index to an NVML device handle by converting the CUDA device UUID to a string and performing a UUID-based lookup. It handles MIG (Multi-Instance GPU) instances by retrying with a "MIG" prefix when the initial "GPU-" prefixed UUID lookup fails. The internal uuid_str helper formats the 16-byte UUID into the standard dash-separated hexadecimal representation.

The CPU affinity functions use the NVML device affinity API to determine which CPU cores have optimal locality to a given GPU. GetNVMLAffinityMask retrieves the NVML-recommended CPU mask for the current CUDA device and intersects it with the thread's existing affinity mask. On CUDA 11+, it uses nvmlDeviceGetCpuAffinityWithinScope with socket scope; otherwise it falls back to nvmlDeviceGetCpuAffinity. SetCPUAffinity either pins the calling thread to a specific core or applies the NVML-recommended affinity, with validation and warning messages for invalid or empty masks.

Usage

This implementation is used internally by DALI to optimize CPU-GPU data transfer performance by setting appropriate CPU affinity for worker threads. The driver version queries support compatibility checks and feature gating. The device handle resolution enables all NVML device-specific queries by bridging the CUDA and NVML device identifier spaces.

Code Reference

Source Location

Signature

namespace dali {
namespace nvml {
namespace impl {

float GetDriverVersion();
int GetCudaDriverVersion();

}  // namespace impl

nvmlDevice_t nvmlGetDeviceHandleForCUDA(int cuda_idx);

void GetNVMLAffinityMask(cpu_set_t *mask, size_t num_cpus);

void SetCPUAffinity(int core = -1);

}  // namespace nvml
}  // namespace dali

Import

#include "dali/util/nvml.h"

I/O Contract

Inputs

Name Type Required Description
cuda_idx int Yes (nvmlGetDeviceHandleForCUDA) CUDA device index to resolve to an NVML handle
mask cpu_set_t* Yes (GetNVMLAffinityMask) Output CPU set to receive the intersection of NVML-recommended and current affinity
num_cpus size_t Yes (GetNVMLAffinityMask) Total number of CPUs configured in the system
core int No (SetCPUAffinity) Specific CPU core to pin to; -1 (default) uses NVML-recommended affinity

Outputs

Name Type Description
return value (GetDriverVersion) float NVIDIA driver version as a float (0 if NVML not initialized)
return value (GetCudaDriverVersion) int CUDA driver version integer (0 if NVML not initialized)
return value (nvmlGetDeviceHandleForCUDA) nvmlDevice_t NVML device handle for the specified CUDA device
mask (GetNVMLAffinityMask) cpu_set_t Populated CPU affinity bitmask (intersection of NVML recommendation and current thread affinity)

Usage Examples

Getting Driver Version Information

#include "dali/util/nvml.h"

dali::nvml::Init();

float driver_version = dali::nvml::impl::GetDriverVersion();
int cuda_driver_version = dali::nvml::impl::GetCudaDriverVersion();

// driver_version is e.g. 535.104
// cuda_driver_version is e.g. 12020

dali::nvml::Shutdown();

Setting CPU Affinity for GPU Locality

#include "dali/util/nvml.h"

dali::nvml::Init();

// Use NVML-recommended CPU affinity for the current CUDA device
dali::nvml::SetCPUAffinity();

// Or pin to a specific CPU core
dali::nvml::SetCPUAffinity(8);

dali::nvml::Shutdown();

Resolving CUDA Device to NVML Handle

#include "dali/util/nvml.h"

dali::nvml::Init();

// Get NVML handle for CUDA device 0 (supports MIG instances)
nvmlDevice_t device = dali::nvml::nvmlGetDeviceHandleForCUDA(0);

dali::nvml::Shutdown();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment