Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Scikit learn Scikit learn Kernel Methods

From Leeroopedia


Knowledge Sources
Domains Supervised Learning, Non-Parametric Methods
Last Updated 2026-02-08 15:00 GMT

Overview

Kernel methods map data into high-dimensional feature spaces via kernel functions and perform linear algorithms in those spaces, enabling non-linear models with strong theoretical foundations.

Description

Kernel methods extend linear algorithms to handle non-linear relationships by implicitly operating in a high-dimensional (or even infinite-dimensional) feature space defined by a kernel function. The kernel trick avoids explicit computation of the high-dimensional feature vectors, instead computing inner products directly via the kernel function. This enables linear methods like ridge regression, PCA, and SVMs to learn non-linear patterns while retaining their theoretical properties. Kernel approximation and random projection methods provide scalable alternatives that explicitly construct approximate low-dimensional feature maps, enabling kernel-like performance with linear algorithms at reduced computational cost.

Usage

Use Kernel Ridge Regression when you want a non-linear regression model that combines the kernel trick with ridge regularization, particularly when exact kernel evaluation is feasible (small to moderate datasets). Use kernel approximation methods (Nystroem, Random Fourier Features) when the dataset is too large for exact kernel methods but you want to approximate kernel-based learning using explicit feature maps with linear models. Use random projection methods (Gaussian random projection, sparse random projection) when you need fast, data-independent dimensionality reduction that approximately preserves pairwise distances, as guaranteed by the Johnson-Lindenstrauss lemma.

Theoretical Basis

Kernel Trick: A kernel function k(x,x) computes the inner product in a feature space without explicit transformation:

k(x,x)=ϕ(x),ϕ(x)

where ϕ:𝒳 maps inputs to a (potentially infinite-dimensional) Hilbert space. Mercer's theorem ensures that any positive semi-definite kernel corresponds to such a mapping.

Kernel Ridge Regression: Combines the kernel trick with ridge regression:

f^(x)=k(x,X)(K+αI)1y

where Kij=k(xi,xj) is the kernel matrix and α is the regularization parameter. The dual formulation has complexity O(n3) for n samples, independent of the (possibly infinite) feature space dimensionality.

Kernel Approximation: Explicit feature maps ϕ^ approximate the kernel:

k(x,x)ϕ^(x)Tϕ^(x)

Random Fourier Features (RBF Sampler): For shift-invariant kernels, Bochner's theorem guarantees:

k(xx)=eiωT(xx)p(ω)dω1Dj=1Dcos(ωjTx+bj)cos(ωjTx+bj)

where ωjp(ω) and bjUniform(0,2π). For the RBF kernel, p(ω)=𝒩(0,2γI).

Nystroem Approximation: Approximates the kernel matrix using a subset of m landmark points:

KCW1CT

where C is the n×m kernel matrix between all points and the landmarks, and W is the m×m kernel matrix among landmarks.

Random Projection: Projects data from d to k using a random matrix R:

x=1kRx

The Johnson-Lindenstrauss Lemma guarantees that for k=O(ε2logn), pairwise distances are preserved within a factor of (1±ε) with high probability:

(1ε)xixj2xixj2(1+ε)xixj2

Gaussian random projection uses Rij𝒩(0,1). Sparse random projection uses a sparse matrix with entries {1,0,+1} with appropriate probabilities, providing computational savings while maintaining the JL guarantee.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment