Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Recommenders team Recommenders GPU CUDA Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Deep_Learning, GPU
Last Updated 2026-02-10 00:00 GMT

Overview

NVIDIA GPU environment with CUDA support, TensorFlow 2.8-2.15, PyTorch 2.0+, and nvidia-ml-py for GPU-accelerated recommendation models.

Description

This environment extends the core Python dependencies with GPU-specific packages for deep learning models. It supports two frameworks: TensorFlow (used by NCF, NRMS, DeepRec, Wide&Deep models) and PyTorch (used by SASRec, SSE-PT, EmbeddingDotBias models). TensorFlow models use the `tf.compat.v1` graph execution mode with `GPUOptions(allow_growth=True)` for dynamic GPU memory allocation. GPU detection is provided via `numba.cuda` and `torch.cuda` with fallback logic.

Usage

Use this environment for any deep learning workflow including NCF training/prediction, NRMS news recommendation, DeepRec sequential models, and the GPU-accelerated benchmarking paths. Required when `recommenders[gpu]` extra is installed.

System Requirements

Category Requirement Notes
OS Linux (Ubuntu 24.04 in Docker) GPU Docker image uses `nvidia/cuda:12.6.1-devel-ubuntu24.04`
Hardware NVIDIA GPU with CUDA support Tested on Azure STANDARD_NC6S_V2 (Tesla P100) and GeForce GTX 1660 Ti
VRAM >= 6 GB Benchmark reference machine uses 6 GB (GTX 1660 Ti)
RAM >= 30 GB Benchmark reference uses 30 GB
CPUs >= 4 Benchmark reference uses 4 CPUs

Dependencies

GPU Python Packages

  • `nvidia-ml-py` >= 11.525.84
  • `tensorflow` >= 2.8.4, != 2.9.0.*, != 2.9.1, != 2.9.2, != 2.10.0.*, < 2.16 (pinned due to security and breaking changes, issue #2073)
  • `tf-slim` >= 1.1.0
  • `torch` >= 2.0.1, < 3
  • `numpy` < 1.25.0 (Python <= 3.8 only, additional GPU constraint)
  • `spacy` <= 3.7.5 (Python <= 3.8 only)

System Packages (Docker)

  • CUDA Toolkit (12.6.1 in Docker, or system-installed)
  • cuDNN (version detected via header files or `torch.backends.cudnn.version()`)

Credentials

No GPU-specific credentials required. Standard NVIDIA driver installation is sufficient.

Quick Install

# Install with GPU extras
pip install "recommenders[gpu]"

# Or install all extras
pip install "recommenders[all]"

Code Evidence

GPU count detection with torch/numba fallback from `recommenders/utils/gpu_utils.py:18-34`:

def get_number_gpus():
    try:
        import torch
        return torch.cuda.device_count()
    except (ImportError, ModuleNotFoundError):
        pass
    try:
        import numba
        return len(numba.cuda.gpus)
    except Exception:
        return 0

CUDA version detection with platform-specific fallback from `recommenders/utils/gpu_utils.py:71-100`:

def get_cuda_version():
    try:
        import torch
        return torch.version.cuda
    except (ImportError, ModuleNotFoundError):
        if sys.platform == "win32":
            candidate = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v*\\version.txt"
            path_list = glob.glob(candidate)
            if path_list:
                path = path_list[0]
        elif sys.platform == "linux" or sys.platform == "darwin":
            path = "/usr/local/cuda/version.txt"

TensorFlow GPU memory growth from `recommenders/models/newsrec/models/base_model.py:61-65`:

# set GPU use with on demand growth
gpu_options = tf.compat.v1.GPUOptions(allow_growth=True)
sess = tf.compat.v1.Session(
    config=tf.compat.v1.ConfigProto(gpu_options=gpu_options)
)

TensorFlow version pinning from `setup.py:58`:

"tensorflow>=2.8.4,!=2.9.0.*,!=2.9.1,!=2.9.2,!=2.10.0.*,<2.16",
# Fixed TF due to constant security problems and breaking changes #2073

GPU validation test from `tests/unit/examples/test_notebooks_gpu.py:14-15`:

def test_gpu_vm():
    assert get_number_gpus() >= 1

Common Errors

Error Message Cause Solution
`CudaSupportError` No NVIDIA GPU or drivers not installed Install NVIDIA drivers and CUDA toolkit
TF models fail with TF > 2.10.1 xDeepFM and SUM models break on newer TF (issue #2018) Use TF < 2.16 as pinned in setup.py
GPU notebook tests disabled (issue #1883) Multiple GPU notebook tests known to fail Check issue tracker for resolution status
`No CUDA available` (logger.info) GPU clear_memory called without CUDA Expected on CPU-only machines; no action needed

Compatibility Notes

  • TensorFlow: All TF-based models use `tf.compat.v1` graph execution mode (eager execution disabled). TF versions 2.9.0-2.9.2 and 2.10.0.x are explicitly excluded due to bugs.
  • PyTorch: Models auto-detect GPU via `torch.cuda.is_available()` and fall back to CPU.
  • Docker: GPU Docker image based on `nvidia/cuda:12.6.1-devel-ubuntu24.04`. CPU image uses `buildpack-deps:24.04`.
  • Azure CI: GPU tests run on Azure STANDARD_NC6S_V2 (6 vCPUs, 112 GB RAM, 1 NVIDIA Tesla P100).
  • Benchmark reference: 4 CPUs, 30 GB RAM, 1 GeForce GTX 1660 Ti (6 GB VRAM).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment