Principle:Pyro ppl Pyro Simulator Based Inference
| Knowledge Sources | |
|---|---|
| Domains | Simulation-Based Inference, Likelihood-Free Inference, Scientific Computing |
| Last Updated | 2026-02-09 09:00 GMT |
Overview
Simulator-based inference inverts scientific simulators by using importance sampling or other inference techniques to estimate posterior distributions over simulator parameters given observed data, even when the simulator's likelihood function is intractable.
Description
Many scientific domains use simulators -- computer programs that model physical, biological, or social processes. These simulators take parameters as input and produce synthetic data as output. The forward simulation is straightforward, but the inverse problem -- inferring which parameters could have produced observed real-world data -- is challenging because:
- The simulator may not have a tractable likelihood function (it may involve random number generation, discretization, or procedural logic).
- The mapping from parameters to data may be complex, nonlinear, and stochastic.
- The simulator may be computationally expensive.
Simulator-based inference (also called likelihood-free inference) addresses this by leveraging the ability to run the simulator forward to generate synthetic data, then using statistical techniques to infer parameters.
Key approaches include:
Importance sampling with learned proposals: Run the simulator many times with parameters drawn from a proposal distribution. Weight each simulation by how well it matches the observed data. The proposal can be improved iteratively by training a neural network to propose parameters likely to produce data similar to the observation.
Approximate Bayesian Computation (ABC): Accept simulator runs where the synthetic data is "close enough" to the observed data (within some tolerance epsilon). As epsilon approaches 0, this converges to the true posterior but at increasing computational cost.
In Pyro, simulator-based inference is expressed naturally: the simulator is written as a probabilistic program, and importance sampling or other inference algorithms are applied. The inclined plane example demonstrates this pattern with a simple physics simulator.
Usage
Use simulator-based inference when:
- The generative model is a complex simulator without a tractable likelihood.
- You can run the simulator forward but cannot compute p(data | parameters) in closed form.
- Calibrating simulator parameters to match observed experimental data.
- The simulator involves stochastic processes, numerical solvers, or procedural generation.
- Scientific applications: particle physics, cosmology, epidemiology, ecology.
Theoretical Basis
Inference as inverting a simulator:
# Simulator: data = simulate(theta, noise)
# where theta = parameters, noise = random seed
# Goal: p(theta | data_observed)
# Problem: p(data | theta) is intractable (no closed-form likelihood)
# But we CAN:
# 1. Sample theta from prior: theta ~ p(theta)
# 2. Run simulator: data_sim = simulate(theta)
# 3. Compare data_sim to data_observed
Importance sampling approach:
# Proposal: theta_i ~ q(theta) for i = 1, ..., N
# Simulate: data_i = simulate(theta_i)
# If we can define a distance or soft likelihood:
# w_i = kernel(data_i, data_observed) * p(theta_i) / q(theta_i)
# Posterior approximation:
# p(theta | data) approx sum_i w_bar_i * delta(theta - theta_i)
# where w_bar_i = w_i / sum_j w_j
# The kernel measures similarity:
# kernel(d1, d2) = exp(-||summary(d1) - summary(d2)||^2 / (2*epsilon^2))
# summary(): sufficient statistics or learned summary statistics
Probabilistic program formulation:
# Express the simulator as a Pyro model:
def model(observed_data):
# Prior over parameters
theta = sample("theta", prior_distribution)
# Run the simulator (may involve multiple stochastic steps)
simulated_data = simulator(theta)
# Soft likelihood (observation model)
sample("obs", Normal(simulated_data, noise_scale), obs=observed_data)
# Inference via importance sampling:
# importance = Importance(model, num_samples=1000)
# posterior = importance.run(observed_data)
# Or via CSIS (compiled sequential importance sampling):
# Train a neural proposal network, then use it for importance sampling
Inclined plane example:
# Physics simulator: ball rolling down inclined plane
# Parameters: theta = angle of incline, mu = friction coefficient
# Simulator: integrates equations of motion
# acceleration = g * (sin(theta) - mu * cos(theta))
# position(t) = 0.5 * acceleration * t^2
# Observed: final position at time T
# Inference: given observed position, infer (theta, mu)
# Uses importance sampling with the simulator as the generative model