Environment:Pyro ppl Pyro CUDA GPU Acceleration

Knowledge Sources	Pyro PyTorch CUDA Docker GPU Setup
Domains	Infrastructure, GPU_Computing
Last Updated	2026-02-09 09:00 GMT

Overview

Optional CUDA GPU environment for accelerating Pyro inference, particularly MCMC sampling and large-scale SVI with neural network guides.

Description

This environment extends the core Python/PyTorch environment with NVIDIA CUDA support. While Pyro runs on CPU by default, GPU acceleration significantly improves performance for MCMC (HMC/NUTS) inference and neural network-based guides (VAEs, amortized inference). The Docker configuration supports both CPU and CUDA builds via configurable arguments.

Usage

Use this environment when running MCMC inference on large models, VAE training with neural network encoders/decoders, or any workflow where PyTorch GPU acceleration provides a speedup. Tests can be directed to run on GPU via the `PYRO_DEVICE` environment variable.

System Requirements

Category	Requirement	Notes
OS	Linux (Ubuntu 24.04 in Docker)	Docker image uses `ubuntu:24.04` as base
Hardware	NVIDIA GPU	Any CUDA-capable GPU
Driver	NVIDIA Driver	Compatible with installed CUDA toolkit
Software	CUDA Toolkit	Via PyTorch CUDA wheel (e.g., cu118, cu121)

Dependencies

System Packages

NVIDIA GPU driver (host system)
CUDA toolkit (bundled with PyTorch CUDA wheels)
`magma-cuda` (optional, for Docker source builds)

Python Packages

`torch` >= 2.0 (CUDA variant)
`torchvision` >= 0.15.0 (CUDA variant, optional)
`torchaudio` (CUDA variant, optional)

Credentials

No credentials required for GPU usage.

Quick Install

# Install PyTorch with CUDA 11.8 support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install Pyro
pip install pyro-ppl

# Verify GPU is available
python -c "import torch; print(torch.cuda.is_available())"

Code Evidence

Test configuration environment variable for device selection from `tests/conftest.py:12-14`:

DTYPE = getattr(torch, os.environ.get("PYRO_DTYPE", "float64"))
torch.set_default_dtype(DTYPE)
torch.set_default_device(os.environ.get("PYRO_DEVICE", "cpu"))

Docker CUDA build support from `docker/Dockerfile:5-12`:

ARG base_img=ubuntu:24.04
FROM ${base_img}

# Optional args
ARG python_version=3
ARG pyro_branch=release
ARG pytorch_whl=cpu
ARG pytorch_branch=release

Docker install script PyTorch CUDA installation from `docker/install.sh:16`:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/${pytorch_whl}

CUDA tensor cloning for multiprocessing from `pyro/infer/mcmc/api.py:560`:

# XXX we clone CUDA tensor args to resolve the issue "Invalid device pointer"
args = [arg.detach() if torch.is_tensor(arg) else arg for arg in args]

Common Errors

Error Message	Cause	Solution
`RuntimeError: CUDA error: no kernel image is available`	GPU compute capability mismatch	Install PyTorch built for your GPU architecture
`RuntimeError: CUDA out of memory`	Insufficient GPU VRAM	Reduce model size, batch size, or use CPU for that step
`Invalid device pointer` in MCMC multiprocessing	CUDA tensor pointer invalidation across processes	Pyro handles this internally by detaching tensors; ensure using latest Pyro version

Compatibility Notes

CPU default: Pyro defaults to CPU (`PYRO_DEVICE=cpu`); set `PYRO_DEVICE=cuda` for GPU testing
MCMC multiprocessing: Multi-chain MCMC on GPU requires tensor detaching/cloning to avoid pointer invalidation across processes
Docker: The official Docker image supports both CPU and CUDA via the `pytorch_whl` build argument
PYRO_DTYPE: Default test dtype is `float64`; set `PYRO_DTYPE=float32` for faster GPU computation at reduced precision

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment