Implementation:NVIDIA DALI NVML Utilities
| Knowledge Sources | |
|---|---|
| Domains | Utilities, GPU_Management |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Implements NVML-based GPU management functions including driver version queries, CUDA device-to-NVML handle resolution, CPU affinity mask retrieval, and CPU affinity setting.
Description
The NVML utilities implementation in dali/util/nvml.cc provides the concrete implementations for the GPU management functions declared in dali/util/nvml.h. The module is organized into three main functional areas: driver version queries, device handle resolution, and CPU affinity management.
The impl::GetDriverVersion and impl::GetCudaDriverVersion functions query the NVIDIA driver version string and CUDA driver version integer respectively via NVML system calls. Both functions check whether NVML is initialized before attempting the query, returning 0 if it is not. These values are cached at first call via static local variables in the inline wrappers declared in the header.
The nvmlGetDeviceHandleForCUDA function resolves a CUDA device index to an NVML device handle by converting the CUDA device UUID to a string and performing a UUID-based lookup. It handles MIG (Multi-Instance GPU) instances by retrying with a "MIG" prefix when the initial "GPU-" prefixed UUID lookup fails. The internal uuid_str helper formats the 16-byte UUID into the standard dash-separated hexadecimal representation.
The CPU affinity functions use the NVML device affinity API to determine which CPU cores have optimal locality to a given GPU. GetNVMLAffinityMask retrieves the NVML-recommended CPU mask for the current CUDA device and intersects it with the thread's existing affinity mask. On CUDA 11+, it uses nvmlDeviceGetCpuAffinityWithinScope with socket scope; otherwise it falls back to nvmlDeviceGetCpuAffinity. SetCPUAffinity either pins the calling thread to a specific core or applies the NVML-recommended affinity, with validation and warning messages for invalid or empty masks.
Usage
This implementation is used internally by DALI to optimize CPU-GPU data transfer performance by setting appropriate CPU affinity for worker threads. The driver version queries support compatibility checks and feature gating. The device handle resolution enables all NVML device-specific queries by bridging the CUDA and NVML device identifier spaces.
Code Reference
Source Location
- Repository: NVIDIA_DALI
- File: dali/util/nvml.cc
- Lines: 1-168
Signature
namespace dali {
namespace nvml {
namespace impl {
float GetDriverVersion();
int GetCudaDriverVersion();
} // namespace impl
nvmlDevice_t nvmlGetDeviceHandleForCUDA(int cuda_idx);
void GetNVMLAffinityMask(cpu_set_t *mask, size_t num_cpus);
void SetCPUAffinity(int core = -1);
} // namespace nvml
} // namespace dali
Import
#include "dali/util/nvml.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| cuda_idx | int |
Yes (nvmlGetDeviceHandleForCUDA) | CUDA device index to resolve to an NVML handle |
| mask | cpu_set_t* |
Yes (GetNVMLAffinityMask) | Output CPU set to receive the intersection of NVML-recommended and current affinity |
| num_cpus | size_t |
Yes (GetNVMLAffinityMask) | Total number of CPUs configured in the system |
| core | int |
No (SetCPUAffinity) | Specific CPU core to pin to; -1 (default) uses NVML-recommended affinity |
Outputs
| Name | Type | Description |
|---|---|---|
| return value (GetDriverVersion) | float |
NVIDIA driver version as a float (0 if NVML not initialized) |
| return value (GetCudaDriverVersion) | int |
CUDA driver version integer (0 if NVML not initialized) |
| return value (nvmlGetDeviceHandleForCUDA) | nvmlDevice_t |
NVML device handle for the specified CUDA device |
| mask (GetNVMLAffinityMask) | cpu_set_t |
Populated CPU affinity bitmask (intersection of NVML recommendation and current thread affinity) |
Usage Examples
Getting Driver Version Information
#include "dali/util/nvml.h"
dali::nvml::Init();
float driver_version = dali::nvml::impl::GetDriverVersion();
int cuda_driver_version = dali::nvml::impl::GetCudaDriverVersion();
// driver_version is e.g. 535.104
// cuda_driver_version is e.g. 12020
dali::nvml::Shutdown();
Setting CPU Affinity for GPU Locality
#include "dali/util/nvml.h"
dali::nvml::Init();
// Use NVML-recommended CPU affinity for the current CUDA device
dali::nvml::SetCPUAffinity();
// Or pin to a specific CPU core
dali::nvml::SetCPUAffinity(8);
dali::nvml::Shutdown();
Resolving CUDA Device to NVML Handle
#include "dali/util/nvml.h"
dali::nvml::Init();
// Get NVML handle for CUDA device 0 (supports MIG instances)
nvmlDevice_t device = dali::nvml::nvmlGetDeviceHandleForCUDA(0);
dali::nvml::Shutdown();