Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Pyro ppl Pyro Distributed Training

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Distributed_Computing
Last Updated 2026-02-09 09:00 GMT

Overview

Optional distributed training environment using Horovod or PyTorch Lightning for scaling Pyro SVI across multiple GPUs or nodes.

Description

This environment extends the core Pyro setup with distributed training capabilities. Pyro supports two distributed backends: Horovod (via `HorovodOptimizer` wrapping `PyroOptim`) and PyTorch Lightning (via standard Lightning integration). These are optional extras that enable multi-GPU and multi-node training for SVI workflows.

Usage

Use this environment when scaling SVI training across multiple GPUs or compute nodes. This is typically needed for large-scale variational inference where a single GPU is insufficient.

System Requirements

Category Requirement Notes
OS Linux Horovod requires Linux; Lightning supports Linux/macOS
Hardware Multiple GPUs or nodes Single GPU does not require distributed training
Network High-bandwidth interconnect For multi-node training (e.g., InfiniBand, NVLink)
MPI OpenMPI or similar Required by Horovod for inter-process communication

Dependencies

System Packages

  • MPI implementation (e.g., OpenMPI) for Horovod
  • NCCL for GPU-to-GPU communication

Python Packages

For Horovod:

  • `horovod[pytorch]` >= 0.19

For Lightning:

  • `lightning` (no version constraint)

Credentials

No credentials required for distributed training itself. Cluster job schedulers may require separate authentication.

Quick Install

# Option 1: Horovod backend
pip install pyro-ppl[horovod]

# Option 2: PyTorch Lightning backend
pip install pyro-ppl[lightning]

Code Evidence

Horovod extras definition from `setup.py:141`:

"horovod": ["horovod[pytorch]>=0.19"],

Lightning extras definition from `setup.py:142`:

"lightning": ["lightning"],

Horovod lazy import from `pyro/optim/horovod.py:36-37`:

def optim_constructor(params, **pt_kwargs) -> Optimizer:
    import horovod.torch as hvd  # type: ignore

Lightning import in example from `examples/svi_lightning.py:18`:

import lightning.pytorch as pl

Common Errors

Error Message Cause Solution
`ModuleNotFoundError: No module named 'horovod'` Horovod not installed `pip install pyro-ppl[horovod]`; may need MPI installed first
`ModuleNotFoundError: No module named 'lightning'` Lightning not installed `pip install pyro-ppl[lightning]`
Horovod compilation errors Missing MPI or NCCL Install OpenMPI and NCCL before installing Horovod

Compatibility Notes

  • Horovod: Requires MPI and NCCL. Installation can be complex; see Horovod documentation for platform-specific instructions
  • Lightning: Simpler installation; works with standard PyTorch distributed backend
  • SVI only: Distributed training applies to SVI workflows; MCMC uses a different parallelism strategy (multi-chain via Python multiprocessing)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment