Principle:Scikit learn Scikit learn Gaussian Process
| Knowledge Sources | |
|---|---|
| Domains | Supervised Learning, Bayesian Inference |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Gaussian processes define a distribution over functions, providing a non-parametric Bayesian approach to regression and classification with built-in uncertainty quantification.
Description
A Gaussian Process (GP) is a collection of random variables, any finite number of which follow a joint Gaussian distribution. GPs are fully specified by a mean function and a covariance (kernel) function, which encodes assumptions about the function being modeled (smoothness, periodicity, length scale). They solve the problem of making predictions with well-calibrated uncertainty estimates without committing to a fixed parametric form. GPs are particularly valuable in settings where uncertainty quantification is critical, such as Bayesian optimization, active learning, and safety-critical applications.
Usage
Use Gaussian Process Regression (GPR) when you need both predictions and uncertainty estimates, when the dataset is small to moderate in size (GPs scale as ), and when the function is expected to be smooth. Use Gaussian Process Classification (GPC) for probabilistic classification with uncertainty estimates. The choice of kernel is critical: use the RBF kernel for smooth functions, the Matern kernel for functions with varying smoothness, periodic kernels for periodic patterns, and composite kernels (sums and products) for complex structure. GPs are not suitable for very large datasets without approximation methods.
Theoretical Basis
Gaussian Process is defined as:
where is the mean function (often zero) and is the covariance (kernel) function.
Gaussian Process Regression: Given training data with noise model , , the predictive distribution at test points is:
where:
Gaussian Process Classification: For binary classification, the latent function is passed through a sigmoid (or probit) link function:
Since the posterior over is no longer Gaussian, approximate inference is needed (Laplace approximation or Expectation Propagation).
Common Kernel Functions:
RBF (Squared Exponential):
Matern kernel:
where is the modified Bessel function. The parameter controls smoothness.
Rational Quadratic:
Hyperparameter optimization is performed by maximizing the log marginal likelihood:
This provides a principled, Bayesian approach to model selection that automatically balances data fit and model complexity (Occam's razor).