Implementation:Pyro ppl Pyro SV DKL

Property	Value
Implementation Type	Pattern Doc
Source File	`examples/contrib/gp/sv-dkl.py`
Module	pyro.contrib.gp
Pyro Features	`pyro.contrib.gp.kernels.Warping`, `pyro.contrib.gp.models.VariationalSparseGP`, `pyro.contrib.gp.kernels.RBF`, `pyro.contrib.gp.likelihoods.Binary`, `pyro.contrib.gp.likelihoods.MultiClass`, `TraceMeanField_ELBO`
Paper	Wilson et al. (2016), "Stochastic Variational Deep Kernel Learning"
Dataset	MNIST

Overview

This file demonstrates Stochastic Variational Deep Kernel Learning, which combines a Convolutional Neural Network (CNN) with a Gaussian Process (GP) for image classification on MNIST. The key idea is to create a "deep kernel" by warping an RBF kernel with a CNN feature extractor.

The architecture:

CNN: Two convolutional layers followed by two fully-connected layers, mapping 28x28 images to 10-dimensional feature vectors.
Deep Kernel: gp.kernels.Warping(rbf, iwarping_fn=cnn) composes the RBF kernel with the CNN, so the kernel operates in the CNN's learned feature space.
VariationalSparseGP: Enables mini-batch training with inducing points that lie in the original image space (not the feature space).
Likelihoods: Binary likelihood for binary classification, MultiClass (Categorical) for 10-class MNIST.

The model achieves ~98.45% accuracy on 10-class MNIST and ~99.41% on binary MNIST after 16 epochs.

Code Reference

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

# Create deep kernel
rbf = gp.kernels.RBF(input_dim=10, lengthscale=torch.ones(10))
deep_kernel = gp.kernels.Warping(rbf, iwarping_fn=cnn)

# Create sparse variational GP
gpmodule = gp.models.VariationalSparseGP(
    X=Xu, y=None, kernel=deep_kernel, Xu=Xu,
    likelihood=likelihood, latent_shape=latent_shape,
    num_data=60000, whiten=True, jitter=2e-6)

# Training loop
optimizer = torch.optim.Adam(gpmodule.parameters(), lr=args.lr)
elbo = infer.TraceMeanField_ELBO()
loss_fn = elbo.differentiable_loss

I/O Contract

Parameter	Type	Description
`--num-inducing`	`int`	Number of inducing points (default: 70)
`--binary`	flag	Binary classification (odd/even digits)
`--batch-size`	`int`	Training batch size (default: 64)
`--epochs`	`int`	Number of training epochs (default: 10)
`--lr`	`float`	Learning rate (default: 0.01)
`--jit`	flag	Use JIT-compiled ELBO

Output:

Test accuracy per epoch
Training time per epoch

Usage Examples

# 10-class MNIST classification
# python sv-dkl.py --epochs 16 --lr 0.01

# Binary classification (odd vs even)
# python sv-dkl.py --binary --epochs 10

# With GPU and JIT
# python sv-dkl.py --cuda --jit

Related Pages

Pyro_ppl_Pyro_GP_BayesOpt - GP-based Bayesian optimization
Pyro_ppl_Pyro_GP_TimeSeries - GP-based time series models

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment