Overview
Concrete tool for approximate kernel feature maps based on Fourier transforms, count sketches, and Nystroem methods provided by scikit-learn.
Description
This module provides several transformer classes that approximate kernel feature maps, enabling the use of linear methods on data that requires nonlinear kernels. PolynomialCountSketch approximates the polynomial kernel using Tensor Sketch with FFT. RBFSampler approximates the RBF kernel using random Fourier features (Random Kitchen Sinks). SkewedChi2Sampler approximates the skewed chi-squared kernel. AdditiveChi2Sampler approximates the additive chi-squared kernel via Fourier transform sampling. Nystroem constructs an approximate kernel map using a subset of training data.
Usage
Use kernel approximation transformers when you need kernel-based features but want to avoid the computational cost of computing the full kernel matrix. Combine these transformers with linear classifiers like SGDClassifier or LinearSVC in a pipeline to achieve kernel SVM-like performance at linear SVM cost.
Code Reference
Source Location
Signature
class PolynomialCountSketch(
ClassNamePrefixFeaturesOutMixin, TransformerMixin, BaseEstimator
):
# Approximates: K(X, Y) = (gamma * <X, Y> + coef0)^degree
class RBFSampler(ClassNamePrefixFeaturesOutMixin, TransformerMixin, BaseEstimator):
# Approximates the RBF kernel using random Fourier features
class SkewedChi2Sampler(
ClassNamePrefixFeaturesOutMixin, TransformerMixin, BaseEstimator
):
# Approximates the skewed chi-squared kernel
class AdditiveChi2Sampler(TransformerMixin, BaseEstimator):
# Approximates the additive chi2 kernel
class Nystroem(ClassNamePrefixFeaturesOutMixin, TransformerMixin, BaseEstimator):
# Approximates a kernel map using a subset of training data
Import
from sklearn.kernel_approximation import (
PolynomialCountSketch,
RBFSampler,
SkewedChi2Sampler,
AdditiveChi2Sampler,
Nystroem,
)
I/O Contract
Inputs (RBFSampler)
| Name |
Type |
Required |
Description
|
| gamma |
float |
No |
Parameter of the RBF kernel (default=1.0)
|
| n_components |
int |
No |
Number of Monte Carlo samples; dimensionality of output (default=100)
|
| random_state |
int, RandomState, or None |
No |
Random state for reproducibility
|
Inputs (Nystroem)
| Name |
Type |
Required |
Description
|
| kernel |
str or callable |
No |
Kernel function name or callable (default='rbf')
|
| gamma |
float or None |
No |
Gamma parameter for RBF, laplacian, polynomial, sigmoid kernels
|
| coef0 |
float or None |
No |
Zero coefficient for polynomial and sigmoid kernels
|
| degree |
float or None |
No |
Degree of the polynomial kernel
|
| n_components |
int |
No |
Number of features to construct (default=100)
|
| random_state |
int, RandomState, or None |
No |
Random state for reproducibility
|
| n_jobs |
int or None |
No |
Number of parallel jobs for pairwise kernel computation
|
Inputs (PolynomialCountSketch)
| Name |
Type |
Required |
Description
|
| gamma |
float |
No |
Parameter of the polynomial kernel (default=1.0)
|
| degree |
int |
No |
Degree of the polynomial kernel (default=2)
|
| coef0 |
int |
No |
Constant term of the polynomial kernel (default=0)
|
| n_components |
int |
No |
Dimensionality of the output feature space (default=100)
|
| random_state |
int, RandomState, or None |
No |
Random state for reproducibility
|
Outputs
| Name |
Type |
Description
|
| transform(X) |
ndarray of shape (n_samples, n_components) |
Transformed feature matrix approximating the kernel map
|
| n_features_in_ |
int |
Number of features seen during fit
|
| components_ |
ndarray |
Sampled data points or random weights (varies by class)
|
| random_offset_ |
ndarray |
Random offset for the feature map (RBFSampler)
|
| random_weights_ |
ndarray |
Random weights for the feature map (RBFSampler)
|
Usage Examples
Basic Usage
from sklearn.kernel_approximation import RBFSampler, Nystroem
from sklearn.linear_model import SGDClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
# Using RBFSampler with a linear classifier
pipe = Pipeline([
("rbf_feature", RBFSampler(gamma=1.0, n_components=100, random_state=42)),
("clf", SGDClassifier(random_state=42)),
])
pipe.fit(X_train, y_train)
print(f"RBFSampler + SGD accuracy: {pipe.score(X_test, y_test):.3f}")
# Using Nystroem with a linear classifier
pipe2 = Pipeline([
("nystroem", Nystroem(kernel="rbf", n_components=50, random_state=42)),
("clf", SGDClassifier(random_state=42)),
])
pipe2.fit(X_train, y_train)
print(f"Nystroem + SGD accuracy: {pipe2.score(X_test, y_test):.3f}")
Related Pages