Principle:Scikit learn Scikit learn Kernel Methods
| Knowledge Sources | |
|---|---|
| Domains | Supervised Learning, Non-Parametric Methods |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Kernel methods map data into high-dimensional feature spaces via kernel functions and perform linear algorithms in those spaces, enabling non-linear models with strong theoretical foundations.
Description
Kernel methods extend linear algorithms to handle non-linear relationships by implicitly operating in a high-dimensional (or even infinite-dimensional) feature space defined by a kernel function. The kernel trick avoids explicit computation of the high-dimensional feature vectors, instead computing inner products directly via the kernel function. This enables linear methods like ridge regression, PCA, and SVMs to learn non-linear patterns while retaining their theoretical properties. Kernel approximation and random projection methods provide scalable alternatives that explicitly construct approximate low-dimensional feature maps, enabling kernel-like performance with linear algorithms at reduced computational cost.
Usage
Use Kernel Ridge Regression when you want a non-linear regression model that combines the kernel trick with ridge regularization, particularly when exact kernel evaluation is feasible (small to moderate datasets). Use kernel approximation methods (Nystroem, Random Fourier Features) when the dataset is too large for exact kernel methods but you want to approximate kernel-based learning using explicit feature maps with linear models. Use random projection methods (Gaussian random projection, sparse random projection) when you need fast, data-independent dimensionality reduction that approximately preserves pairwise distances, as guaranteed by the Johnson-Lindenstrauss lemma.
Theoretical Basis
Kernel Trick: A kernel function computes the inner product in a feature space without explicit transformation:
where maps inputs to a (potentially infinite-dimensional) Hilbert space. Mercer's theorem ensures that any positive semi-definite kernel corresponds to such a mapping.
Kernel Ridge Regression: Combines the kernel trick with ridge regression:
where is the kernel matrix and is the regularization parameter. The dual formulation has complexity for samples, independent of the (possibly infinite) feature space dimensionality.
Kernel Approximation: Explicit feature maps approximate the kernel:
Random Fourier Features (RBF Sampler): For shift-invariant kernels, Bochner's theorem guarantees:
where and . For the RBF kernel, .
Nystroem Approximation: Approximates the kernel matrix using a subset of landmark points:
where is the kernel matrix between all points and the landmarks, and is the kernel matrix among landmarks.
Random Projection: Projects data from to using a random matrix :
The Johnson-Lindenstrauss Lemma guarantees that for , pairwise distances are preserved within a factor of with high probability:
Gaussian random projection uses . Sparse random projection uses a sparse matrix with entries with appropriate probabilities, providing computational savings while maintaining the JL guarantee.