Principle:Pyro ppl Pyro Observation Conditioning

Metadata

Field	Value
Page Type	Principle
Knowledge Sources	Repo (Pyro)
Domains	Bayesian_Inference, Probabilistic_Programming
Last Updated	2026-02-09 12:00 GMT

Overview

Conditioning a probabilistic model on observed data, the core operation in Bayesian inference that updates prior beliefs with observed evidence to produce posterior distributions over latent variables.

Description

Observation conditioning is the foundational operation that connects a probabilistic model to real-world data. In Bayesian inference, a model specifies a joint distribution over latent variables theta and observed variables y. Conditioning on observed data y = y_obs yields the posterior distribution p(theta | y_obs), which represents updated beliefs about the latent variables given the evidence.

In Pyro, conditioning can be accomplished through two complementary mechanisms:

Inline Conditioning with obs=

The most direct way to condition on data is to pass the observed values directly to pyro.sample via the obs keyword argument:

pyro.sample("obs", dist.Normal(mu, sigma), obs=data)

When obs is provided, the sample site:

Does not draw a random sample from the distribution.
Instead, uses the provided observed value and scores it under the distribution (computes the log probability).
The log probability is accumulated into the model's joint log density, which is used by inference algorithms.

This approach is simple and explicit, but it requires modifying the model code to include the observations.

Programmatic Conditioning with poutine.condition

Pyro's effect handler system provides a more flexible mechanism through poutine.condition. This handler intercepts sample sites at runtime and replaces their values with observed data from an external dictionary:

conditioned_model = poutine.condition(model, data={"obs": data})

The conditioned model behaves identically to a model where obs=data was passed inline, but without modifying the original model code. This separation of model and data is particularly useful for:

MCMC inference: Where the same model is run many times with different parameter proposals but the same observed data.
Model reuse: The same model can be conditioned on different datasets without modification.
Programmatic conditioning: Data can be bound to sample sites dynamically based on runtime logic.
Testing and debugging: Models can be tested in their unconditional form and then conditioned for inference.

How Conditioning Works

Under the hood, both mechanisms achieve the same effect through Pyro's message-passing (poutine) system:

When a pyro.sample statement is executed, it generates a message containing the distribution, site name, and other metadata.
If conditioning is active (either via obs= or poutine.condition), the message is intercepted and the value field is replaced with the observed data.
The is_observed flag on the message is set to True.
The log probability log p(y_obs | theta) is computed and added to the trace.
Inference algorithms use this log probability as part of the joint log density.

Conditioning on a Trace

In addition to conditioning on a dictionary of tensors, poutine.condition can accept a Trace object. This allows conditioning on an entire execution trace from a previous model run, which is useful for replay-based inference methods and debugging.

Usage

Observation conditioning is used whenever performing Bayesian inference:

Standard Bayesian inference: Conditioning a generative model on observed data to obtain posterior samples via MCMC or variational inference.
Posterior predictive checks: Conditioning on observed data, sampling from the posterior, then generating predictions to compare against the observations.
Model composition: Building complex models by conditioning submodels on shared latent variables or observed data.
Amortized inference: Using poutine.condition to programmatically bind different datasets to the same model architecture.
Missing data: Selectively conditioning on available observations while leaving missing values as latent (unconditioned) sample sites.

Theoretical Basis

Bayes' Theorem

Conditioning implements Bayes' theorem:

p(theta | y_obs) = p(y_obs | theta) * p(theta) / p(y_obs)

where:

p(theta) is the prior distribution over latent variables.
p(y_obs | theta) is the likelihood of the observed data given the latent variables.
p(y_obs) = integral p(y_obs | theta) p(theta) d theta is the marginal likelihood (evidence).
p(theta | y_obs) is the posterior distribution.

In practice, the normalizing constant p(y_obs) is often intractable, which is why MCMC and variational inference methods work with the unnormalized posterior p(y_obs | theta) * p(theta).

Log Joint Density

Pyro's inference algorithms operate on the log joint density:

log p(theta, y_obs) = log p(theta) + log p(y_obs | theta)

Conditioning adds the term log p(y_obs | theta) to the trace by scoring the observed value under the specified distribution at the conditioned sample site. This is the fundamental quantity that MCMC kernels (HMC, NUTS) use as the negative potential energy.

Related Pages

Implementation:Pyro_ppl_Pyro_Poutine_Condition

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment