Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Pyro ppl Pyro MCMC Inference

From Leeroopedia


Knowledge Sources
Domains Probabilistic_Programming, MCMC, Bayesian_Modeling
Last Updated 2026-02-09 09:00 GMT

Overview

End-to-end process for performing Markov Chain Monte Carlo (MCMC) inference in Pyro using the No-U-Turn Sampler (NUTS) or Hamiltonian Monte Carlo (HMC) to draw samples from the posterior distribution.

Description

This workflow describes the procedure for exact posterior sampling using Pyro's MCMC framework. Unlike variational inference which produces a parametric approximation, MCMC generates asymptotically exact samples from the target posterior. The process covers defining a probabilistic model, conditioning on observed data, selecting and configuring a sampling kernel (NUTS or HMC), running the sampler with warmup adaptation, computing diagnostics (effective sample size, R-hat), and generating posterior predictive samples. NUTS automatically tunes step size and trajectory length, making it the default choice for most continuous models.

Usage

Execute this workflow when you need high-fidelity posterior samples and can afford the computational cost. MCMC is preferred over SVI when posterior accuracy is critical, when the posterior is multimodal or has complex geometry, when you need reliable uncertainty quantification, or when the dataset is small enough that MCMC runtime is acceptable. Typical use cases include hierarchical models, model comparison, and scientific inference where calibrated posteriors are essential.

Execution Steps

Step 1: Define the Probabilistic Model

Write a model function using Pyro's probabilistic primitives that specifies prior distributions over parameters and the likelihood of observed data. Use pyro.sample for random variables, pyro.plate for conditional independence, and pyro.deterministic for derived quantities you want to track in the posterior trace. The model should be a pure function of its arguments (data and hyperparameters).

Key considerations:

  • Every latent parameter to be inferred needs a pyro.sample statement
  • Use pyro.deterministic for derived quantities you want to track
  • MCMC works with continuous latent variables (discrete variables require marginalization or enumeration)
  • Model should not call pyro.param — that is for variational parameters

Step 2: Condition on Observed Data

Attach observed data to the model either by passing obs keyword arguments directly to sample statements, or by using pyro.poutine.condition to externally fix sample sites to observed values. The conditioning approach separates data from model specification, making the model more reusable.

Key considerations:

  • The obs= keyword directly within pyro.sample is the simplest approach
  • poutine.condition allows conditioning without modifying the model function
  • All observation sites must have matching data shapes
  • Missing data can be handled via NanMasked distributions

Step 3: Select and Configure the MCMC Kernel

Choose between NUTS (No-U-Turn Sampler) and HMC (Hamiltonian Monte Carlo) as the sampling kernel. NUTS is the default choice as it automatically adapts trajectory length. Configure kernel parameters including step size, target accept probability, maximum tree depth (for NUTS), and trajectory length (for HMC). Optionally apply reparameterization strategies to improve sampling geometry.

Key considerations:

  • NUTS is preferred in most cases for its automatic trajectory length adaptation
  • target_accept_prob controls step size adaptation (default 0.8, higher values give smaller steps)
  • max_tree_depth limits NUTS tree expansion (default 10)
  • Reparameterization (LocScale, Haar, NeuTra) can dramatically improve sampling efficiency
  • jit_compile=True enables XLA compilation for speedup

Step 4: Configure the MCMC Sampler

Create the MCMC object specifying the number of warmup steps, the number of posterior samples, and optionally multiple chains. Warmup steps are used for step size and mass matrix adaptation and are discarded. Multiple chains enable convergence diagnostics via R-hat statistics.

Key considerations:

  • Warmup should be at least 200-500 steps for adequate adaptation
  • num_samples determines the posterior sample count (typically 500-2000)
  • Multiple chains (num_chains > 1) enable R-hat convergence checks
  • mp_context can be set for parallel chain execution
  • initial_params can seed the sampler near a good starting point

Step 5: Run the Sampler

Execute mcmc.run() passing any arguments that the model function expects (e.g., data tensors). The sampler performs warmup adaptation (step size and mass matrix tuning) followed by posterior sampling. Progress is reported during execution.

Key considerations:

  • Pass the same data arguments used in the model function
  • Warmup phase adapts step size via dual averaging and estimates mass matrix
  • Divergent transitions indicate problematic posterior geometry
  • Runtime scales with model complexity and number of samples

Step 6: Compute Diagnostics

Evaluate sampling quality using mcmc.summary() for parameter statistics and mcmc.diagnostics() for chain health. Key diagnostics include effective sample size (n_eff), R-hat convergence statistic, and divergent transition count. R-hat values above 1.05 indicate potential non-convergence; n_eff should be a substantial fraction of total samples.

Key considerations:

  • R-hat < 1.05 for all parameters indicates convergence
  • Effective sample size should be >100 for reliable estimates
  • Divergent transitions suggest reparameterization is needed
  • mcmc.get_samples() returns a dictionary of posterior sample tensors

Step 7: Generate Posterior Predictive Samples

Use the Predictive utility class with the posterior samples to generate predictions on new data. Predictive runs the model forward using each posterior sample, producing a distribution of predictions that captures both parameter uncertainty and observation noise.

Key considerations:

  • Predictive accepts the model and posterior_samples dictionary
  • return_sites controls which sample sites are returned
  • num_samples can thin the posterior samples if needed
  • Posterior predictive checks compare predictions to held-out data

Execution Diagram

GitHub URL

Workflow Repository