Implementation:Scikit learn Scikit learn RandomProjection
| Knowledge Sources | |
|---|---|
| Domains | Dimensionality Reduction, Random Projection |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for dimensionality reduction through random projection provided by scikit-learn.
Description
The random_projection module provides transformers for reducing dimensionality of data using random projection, which trades a controlled amount of accuracy for faster processing times and smaller model sizes. GaussianRandomProjection uses a random matrix drawn from a Gaussian distribution, while SparseRandomProjection uses a sparse random matrix that is more memory-efficient. Both are grounded in the Johnson-Lindenstrauss lemma, which guarantees that pairwise distances are approximately preserved. The module also provides johnson_lindenstrauss_min_dim to compute the minimum safe number of components.
Usage
Use random projection when you need fast, computationally efficient dimensionality reduction that approximately preserves pairwise distances. It is particularly useful for high-dimensional data where PCA would be too expensive, and as a preprocessing step before applying algorithms sensitive to the curse of dimensionality. SparseRandomProjection is preferred for very large datasets due to its memory efficiency.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/random_projection.py
Signature
class GaussianRandomProjection(BaseRandomProjection):
def __init__(
self,
n_components="auto",
*,
eps=0.1,
compute_inverse_components=False,
random_state=None,
):
class SparseRandomProjection(BaseRandomProjection):
def __init__(
self,
n_components="auto",
*,
density="auto",
eps=0.1,
dense_output=False,
compute_inverse_components=False,
random_state=None,
):
def johnson_lindenstrauss_min_dim(n_samples, *, eps=0.1):
Import
from sklearn.random_projection import GaussianRandomProjection
from sklearn.random_projection import SparseRandomProjection
from sklearn.random_projection import johnson_lindenstrauss_min_dim
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| n_components | int or "auto" | No | Dimensionality of the target projection space. "auto" computes from eps and n_samples using the Johnson-Lindenstrauss lemma. Default is "auto". |
| eps | float | No | Maximum distortion rate as defined by the Johnson-Lindenstrauss lemma. Used when n_components is "auto". Default is 0.1. |
| compute_inverse_components | bool | No | Whether to compute the pseudo-inverse of the components for inverse_transform. Default is False. |
| random_state | int or RandomState | No | Controls the random number generator for reproducibility. |
| density | float or "auto" | No | Ratio of non-zero components in the random projection matrix (SparseRandomProjection only). Default is "auto" (1/sqrt(n_features)). |
| dense_output | bool | No | Force output to be a dense array (SparseRandomProjection only). Default is False. |
Outputs
| Name | Type | Description |
|---|---|---|
| X_transformed | ndarray or sparse matrix of shape (n_samples, n_components) | The projected data in the lower-dimensional space. |
| components_ | ndarray or sparse matrix of shape (n_components, n_features) | The random projection matrix. |
| n_components_ | int | The concrete number of components computed when n_components is "auto". |
Usage Examples
Basic Usage
from sklearn.random_projection import GaussianRandomProjection, SparseRandomProjection
from sklearn.random_projection import johnson_lindenstrauss_min_dim
import numpy as np
# Compute minimum dimensions needed
min_dim = johnson_lindenstrauss_min_dim(n_samples=1000, eps=0.1)
print(f"Minimum components for eps=0.1: {min_dim}")
# Gaussian random projection
X = np.random.rand(100, 1000)
grp = GaussianRandomProjection(n_components=50, random_state=42)
X_projected = grp.fit_transform(X)
print(f"Original shape: {X.shape}, Projected shape: {X_projected.shape}")
# Sparse random projection
srp = SparseRandomProjection(n_components=50, random_state=42)
X_sparse_proj = srp.fit_transform(X)
print(f"Sparse projected shape: {X_sparse_proj.shape}")