Implementation:Rapidsai Cuml RandomProjection
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Dimensionality_Reduction |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
The random_projection module provides GPU-accelerated implementations of Gaussian and Sparse random projection for dimensionality reduction, along with the Johnson-Lindenstrauss minimum dimension utility function.
Description
This module contains three main components:
johnson_lindenstrauss_min_dim(n_samples, eps) -- A utility function that computes the minimum number of random projection components needed to guarantee that pairwise distances are preserved within an eps error tolerance, based on the Johnson-Lindenstrauss lemma.
GaussianRandomProjection -- Reduces dimensionality by multiplying input data with a dense random matrix whose components are drawn from N(0, 1/n_components). It extends _BaseRandomProjection and generates a dense Gaussian random matrix at fit time.
SparseRandomProjection -- Reduces dimensionality through a sparse random projection matrix, providing similar embedding quality as Gaussian random projection while being much more memory efficient. The sparse matrix uses a three-value distribution (-sqrt(s/n_components), 0, +sqrt(s/n_components)) where s = 1/density. The density parameter controls sparsity and defaults to 1/sqrt(n_features).
Both projection classes support automatic component count selection via the Johnson-Lindenstrauss lemma when n_components='auto'. They accept both dense and sparse input matrices and delegate to cuML's internal array handling for GPU computation.
Usage
Use GaussianRandomProjection or SparseRandomProjection when you need fast, approximate dimensionality reduction that preserves pairwise distances. SparseRandomProjection is preferred when memory efficiency matters or when the projection matrix needs to be sparse. These methods are useful as preprocessing steps for downstream ML algorithms when the original feature space is very high-dimensional.
Code Reference
Source Location
- Repository: Rapidsai_Cuml
- File:
python/cuml/cuml/random_projection/random_projection.py
Signature
def johnson_lindenstrauss_min_dim(n_samples, eps=0.1)
class GaussianRandomProjection(_BaseRandomProjection):
def __init__(
self,
n_components="auto",
*,
eps=0.1,
random_state=None,
output_type=None,
verbose=False,
)
class SparseRandomProjection(_BaseRandomProjection):
def __init__(
self,
n_components="auto",
*,
density="auto",
eps=0.1,
dense_output=False,
random_state=None,
output_type=None,
verbose=False,
)
Import
from cuml.random_projection import GaussianRandomProjection
from cuml.random_projection import SparseRandomProjection
from cuml.random_projection.random_projection import johnson_lindenstrauss_min_dim
I/O Contract
Inputs (GaussianRandomProjection)
| Name | Type | Required | Description |
|---|---|---|---|
| n_components | int or 'auto' | No | Target projection dimensionality. If 'auto', determined by Johnson-Lindenstrauss lemma. Default is 'auto'. |
| eps | float | No | Maximum distortion rate for JL lemma when n_components='auto'. Must be in (0, 1). Default is 0.1. |
| random_state | int, RandomState, or None | No | Controls random number generation for the projection matrix. Default is None. |
| output_type | str or None | No | Return results in the indicated output type. |
| verbose | int or bool | No | Sets logging level. Default is False. |
Inputs (SparseRandomProjection)
| Name | Type | Required | Description |
|---|---|---|---|
| n_components | int or 'auto' | No | Target projection dimensionality. If 'auto', determined by Johnson-Lindenstrauss lemma. Default is 'auto'. |
| density | float or 'auto' | No | Ratio of non-zero components in the projection matrix (0, 1]. If 'auto', set to 1/sqrt(n_features). Default is 'auto'. |
| eps | float | No | Maximum distortion rate for JL lemma when n_components='auto'. Default is 0.1. |
| dense_output | bool | No | If True, output is always dense even for sparse inputs. Default is False. |
| random_state | int, RandomState, or None | No | Controls random number generation. Default is None. |
| output_type | str or None | No | Return results in the indicated output type. |
| verbose | int or bool | No | Sets logging level. Default is False. |
Outputs
| Name | Type | Description |
|---|---|---|
| n_components_ | int | Concrete number of components computed (relevant when n_components='auto'). |
| components_ | array or sparse matrix (n_components, n_features) | Random matrix used for the projection. |
| density_ | float | (SparseRandomProjection only) Concrete density value computed from 'auto'. |
| n_features_in_ | int | Number of features seen during fit. |
Usage Examples
Gaussian Random Projection
from cuml.random_projection import GaussianRandomProjection
from cuml.datasets import make_blobs
# Create high-dimensional data
X, _ = make_blobs(n_samples=200, n_features=1000, random_state=42)
# Project to 50 dimensions
model = GaussianRandomProjection(n_components=50, random_state=42)
X_new = model.fit_transform(X)
print(X_new.shape) # (200, 50)
Sparse Random Projection
from cuml.random_projection import SparseRandomProjection
from cuml.datasets import make_blobs
# Create high-dimensional data
X, _ = make_blobs(n_samples=200, n_features=1000, random_state=42)
# Project using sparse random matrix
model = SparseRandomProjection(n_components=50, random_state=42)
X_new = model.fit_transform(X)
print(X_new.shape) # (200, 50)
Auto Component Selection
from cuml.random_projection import GaussianRandomProjection
from cuml.random_projection.random_projection import johnson_lindenstrauss_min_dim
# Compute minimum dimension for 1000 samples with 10% distortion
min_dim = johnson_lindenstrauss_min_dim(n_samples=1000, eps=0.1)
print(f"Minimum components needed: {min_dim}")
# Or let the model determine it automatically
model = GaussianRandomProjection(n_components="auto", eps=0.1)