Environment:Pyro ppl Pyro Distributed Training
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Distributed_Computing |
| Last Updated | 2026-02-09 09:00 GMT |
Overview
Optional distributed training environment using Horovod or PyTorch Lightning for scaling Pyro SVI across multiple GPUs or nodes.
Description
This environment extends the core Pyro setup with distributed training capabilities. Pyro supports two distributed backends: Horovod (via `HorovodOptimizer` wrapping `PyroOptim`) and PyTorch Lightning (via standard Lightning integration). These are optional extras that enable multi-GPU and multi-node training for SVI workflows.
Usage
Use this environment when scaling SVI training across multiple GPUs or compute nodes. This is typically needed for large-scale variational inference where a single GPU is insufficient.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | Horovod requires Linux; Lightning supports Linux/macOS |
| Hardware | Multiple GPUs or nodes | Single GPU does not require distributed training |
| Network | High-bandwidth interconnect | For multi-node training (e.g., InfiniBand, NVLink) |
| MPI | OpenMPI or similar | Required by Horovod for inter-process communication |
Dependencies
System Packages
- MPI implementation (e.g., OpenMPI) for Horovod
- NCCL for GPU-to-GPU communication
Python Packages
For Horovod:
- `horovod[pytorch]` >= 0.19
For Lightning:
- `lightning` (no version constraint)
Credentials
No credentials required for distributed training itself. Cluster job schedulers may require separate authentication.
Quick Install
# Option 1: Horovod backend
pip install pyro-ppl[horovod]
# Option 2: PyTorch Lightning backend
pip install pyro-ppl[lightning]
Code Evidence
Horovod extras definition from `setup.py:141`:
"horovod": ["horovod[pytorch]>=0.19"],
Lightning extras definition from `setup.py:142`:
"lightning": ["lightning"],
Horovod lazy import from `pyro/optim/horovod.py:36-37`:
def optim_constructor(params, **pt_kwargs) -> Optimizer:
import horovod.torch as hvd # type: ignore
Lightning import in example from `examples/svi_lightning.py:18`:
import lightning.pytorch as pl
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ModuleNotFoundError: No module named 'horovod'` | Horovod not installed | `pip install pyro-ppl[horovod]`; may need MPI installed first |
| `ModuleNotFoundError: No module named 'lightning'` | Lightning not installed | `pip install pyro-ppl[lightning]` |
| Horovod compilation errors | Missing MPI or NCCL | Install OpenMPI and NCCL before installing Horovod |
Compatibility Notes
- Horovod: Requires MPI and NCCL. Installation can be complex; see Horovod documentation for platform-specific instructions
- Lightning: Simpler installation; works with standard PyTorch distributed backend
- SVI only: Distributed training applies to SVI workflows; MCMC uses a different parallelism strategy (multi-chain via Python multiprocessing)