Environment:Recommenders team Recommenders GPU CUDA Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deep_Learning, GPU |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
NVIDIA GPU environment with CUDA support, TensorFlow 2.8-2.15, PyTorch 2.0+, and nvidia-ml-py for GPU-accelerated recommendation models.
Description
This environment extends the core Python dependencies with GPU-specific packages for deep learning models. It supports two frameworks: TensorFlow (used by NCF, NRMS, DeepRec, Wide&Deep models) and PyTorch (used by SASRec, SSE-PT, EmbeddingDotBias models). TensorFlow models use the `tf.compat.v1` graph execution mode with `GPUOptions(allow_growth=True)` for dynamic GPU memory allocation. GPU detection is provided via `numba.cuda` and `torch.cuda` with fallback logic.
Usage
Use this environment for any deep learning workflow including NCF training/prediction, NRMS news recommendation, DeepRec sequential models, and the GPU-accelerated benchmarking paths. Required when `recommenders[gpu]` extra is installed.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu 24.04 in Docker) | GPU Docker image uses `nvidia/cuda:12.6.1-devel-ubuntu24.04` |
| Hardware | NVIDIA GPU with CUDA support | Tested on Azure STANDARD_NC6S_V2 (Tesla P100) and GeForce GTX 1660 Ti |
| VRAM | >= 6 GB | Benchmark reference machine uses 6 GB (GTX 1660 Ti) |
| RAM | >= 30 GB | Benchmark reference uses 30 GB |
| CPUs | >= 4 | Benchmark reference uses 4 CPUs |
Dependencies
GPU Python Packages
- `nvidia-ml-py` >= 11.525.84
- `tensorflow` >= 2.8.4, != 2.9.0.*, != 2.9.1, != 2.9.2, != 2.10.0.*, < 2.16 (pinned due to security and breaking changes, issue #2073)
- `tf-slim` >= 1.1.0
- `torch` >= 2.0.1, < 3
- `numpy` < 1.25.0 (Python <= 3.8 only, additional GPU constraint)
- `spacy` <= 3.7.5 (Python <= 3.8 only)
System Packages (Docker)
- CUDA Toolkit (12.6.1 in Docker, or system-installed)
- cuDNN (version detected via header files or `torch.backends.cudnn.version()`)
Credentials
No GPU-specific credentials required. Standard NVIDIA driver installation is sufficient.
Quick Install
# Install with GPU extras
pip install "recommenders[gpu]"
# Or install all extras
pip install "recommenders[all]"
Code Evidence
GPU count detection with torch/numba fallback from `recommenders/utils/gpu_utils.py:18-34`:
def get_number_gpus():
try:
import torch
return torch.cuda.device_count()
except (ImportError, ModuleNotFoundError):
pass
try:
import numba
return len(numba.cuda.gpus)
except Exception:
return 0
CUDA version detection with platform-specific fallback from `recommenders/utils/gpu_utils.py:71-100`:
def get_cuda_version():
try:
import torch
return torch.version.cuda
except (ImportError, ModuleNotFoundError):
if sys.platform == "win32":
candidate = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v*\\version.txt"
path_list = glob.glob(candidate)
if path_list:
path = path_list[0]
elif sys.platform == "linux" or sys.platform == "darwin":
path = "/usr/local/cuda/version.txt"
TensorFlow GPU memory growth from `recommenders/models/newsrec/models/base_model.py:61-65`:
# set GPU use with on demand growth
gpu_options = tf.compat.v1.GPUOptions(allow_growth=True)
sess = tf.compat.v1.Session(
config=tf.compat.v1.ConfigProto(gpu_options=gpu_options)
)
TensorFlow version pinning from `setup.py:58`:
"tensorflow>=2.8.4,!=2.9.0.*,!=2.9.1,!=2.9.2,!=2.10.0.*,<2.16",
# Fixed TF due to constant security problems and breaking changes #2073
GPU validation test from `tests/unit/examples/test_notebooks_gpu.py:14-15`:
def test_gpu_vm():
assert get_number_gpus() >= 1
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `CudaSupportError` | No NVIDIA GPU or drivers not installed | Install NVIDIA drivers and CUDA toolkit |
| TF models fail with TF > 2.10.1 | xDeepFM and SUM models break on newer TF (issue #2018) | Use TF < 2.16 as pinned in setup.py |
| GPU notebook tests disabled (issue #1883) | Multiple GPU notebook tests known to fail | Check issue tracker for resolution status |
| `No CUDA available` (logger.info) | GPU clear_memory called without CUDA | Expected on CPU-only machines; no action needed |
Compatibility Notes
- TensorFlow: All TF-based models use `tf.compat.v1` graph execution mode (eager execution disabled). TF versions 2.9.0-2.9.2 and 2.10.0.x are explicitly excluded due to bugs.
- PyTorch: Models auto-detect GPU via `torch.cuda.is_available()` and fall back to CPU.
- Docker: GPU Docker image based on `nvidia/cuda:12.6.1-devel-ubuntu24.04`. CPU image uses `buildpack-deps:24.04`.
- Azure CI: GPU tests run on Azure STANDARD_NC6S_V2 (6 vCPUs, 112 GB RAM, 1 NVIDIA Tesla P100).
- Benchmark reference: 4 CPUs, 30 GB RAM, 1 GeForce GTX 1660 Ti (6 GB VRAM).
Related Pages
- Implementation:Recommenders_team_Recommenders_NCF_Init_And_Fit
- Implementation:Recommenders_team_Recommenders_NCF_Predict
- Implementation:Recommenders_team_Recommenders_NRMSModel_Init
- Implementation:Recommenders_team_Recommenders_BaseModel_Fit
- Implementation:Recommenders_team_Recommenders_BaseModel_Run_Eval
- Implementation:Recommenders_team_Recommenders_BaseModel_Run_Fast_Eval