Implementation:Rapidsai Cuml RandomProjection

Knowledge Sources	Rapidsai_Cuml
Domains	Machine_Learning, Dimensionality_Reduction
Last Updated	2026-02-08 12:00 GMT

Overview

The random_projection module provides GPU-accelerated implementations of Gaussian and Sparse random projection for dimensionality reduction, along with the Johnson-Lindenstrauss minimum dimension utility function.

Description

This module contains three main components:

johnson_lindenstrauss_min_dim(n_samples, eps) -- A utility function that computes the minimum number of random projection components needed to guarantee that pairwise distances are preserved within an eps error tolerance, based on the Johnson-Lindenstrauss lemma.

GaussianRandomProjection -- Reduces dimensionality by multiplying input data with a dense random matrix whose components are drawn from N(0, 1/n_components). It extends _BaseRandomProjection and generates a dense Gaussian random matrix at fit time.

SparseRandomProjection -- Reduces dimensionality through a sparse random projection matrix, providing similar embedding quality as Gaussian random projection while being much more memory efficient. The sparse matrix uses a three-value distribution (-sqrt(s/n_components), 0, +sqrt(s/n_components)) where s = 1/density. The density parameter controls sparsity and defaults to 1/sqrt(n_features).

Both projection classes support automatic component count selection via the Johnson-Lindenstrauss lemma when n_components='auto'. They accept both dense and sparse input matrices and delegate to cuML's internal array handling for GPU computation.

Usage

Use GaussianRandomProjection or SparseRandomProjection when you need fast, approximate dimensionality reduction that preserves pairwise distances. SparseRandomProjection is preferred when memory efficiency matters or when the projection matrix needs to be sparse. These methods are useful as preprocessing steps for downstream ML algorithms when the original feature space is very high-dimensional.

Code Reference

Source Location

Repository: Rapidsai_Cuml
File: python/cuml/cuml/random_projection/random_projection.py

Signature

def johnson_lindenstrauss_min_dim(n_samples, eps=0.1)

class GaussianRandomProjection(_BaseRandomProjection):
    def __init__(
        self,
        n_components="auto",
        *,
        eps=0.1,
        random_state=None,
        output_type=None,
        verbose=False,
    )

class SparseRandomProjection(_BaseRandomProjection):
    def __init__(
        self,
        n_components="auto",
        *,
        density="auto",
        eps=0.1,
        dense_output=False,
        random_state=None,
        output_type=None,
        verbose=False,
    )

Import

from cuml.random_projection import GaussianRandomProjection
from cuml.random_projection import SparseRandomProjection
from cuml.random_projection.random_projection import johnson_lindenstrauss_min_dim

I/O Contract

Inputs (GaussianRandomProjection)

Name	Type	Required	Description
n_components	int or 'auto'	No	Target projection dimensionality. If 'auto', determined by Johnson-Lindenstrauss lemma. Default is 'auto'.
eps	float	No	Maximum distortion rate for JL lemma when n_components='auto'. Must be in (0, 1). Default is 0.1.
random_state	int, RandomState, or None	No	Controls random number generation for the projection matrix. Default is None.
output_type	str or None	No	Return results in the indicated output type.
verbose	int or bool	No	Sets logging level. Default is False.

Inputs (SparseRandomProjection)

Name	Type	Required	Description
n_components	int or 'auto'	No	Target projection dimensionality. If 'auto', determined by Johnson-Lindenstrauss lemma. Default is 'auto'.
density	float or 'auto'	No	Ratio of non-zero components in the projection matrix (0, 1]. If 'auto', set to 1/sqrt(n_features). Default is 'auto'.
eps	float	No	Maximum distortion rate for JL lemma when n_components='auto'. Default is 0.1.
dense_output	bool	No	If True, output is always dense even for sparse inputs. Default is False.
random_state	int, RandomState, or None	No	Controls random number generation. Default is None.
output_type	str or None	No	Return results in the indicated output type.
verbose	int or bool	No	Sets logging level. Default is False.

Outputs

Name	Type	Description
n_components_	int	Concrete number of components computed (relevant when n_components='auto').
components_	array or sparse matrix (n_components, n_features)	Random matrix used for the projection.
density_	float	(SparseRandomProjection only) Concrete density value computed from 'auto'.
n_features_in_	int	Number of features seen during fit.

Usage Examples

Gaussian Random Projection

from cuml.random_projection import GaussianRandomProjection
from cuml.datasets import make_blobs

# Create high-dimensional data
X, _ = make_blobs(n_samples=200, n_features=1000, random_state=42)

# Project to 50 dimensions
model = GaussianRandomProjection(n_components=50, random_state=42)
X_new = model.fit_transform(X)
print(X_new.shape)  # (200, 50)

Sparse Random Projection

from cuml.random_projection import SparseRandomProjection
from cuml.datasets import make_blobs

# Create high-dimensional data
X, _ = make_blobs(n_samples=200, n_features=1000, random_state=42)

# Project using sparse random matrix
model = SparseRandomProjection(n_components=50, random_state=42)
X_new = model.fit_transform(X)
print(X_new.shape)  # (200, 50)

Auto Component Selection

from cuml.random_projection import GaussianRandomProjection
from cuml.random_projection.random_projection import johnson_lindenstrauss_min_dim

# Compute minimum dimension for 1000 samples with 10% distortion
min_dim = johnson_lindenstrauss_min_dim(n_samples=1000, eps=0.1)
print(f"Minimum components needed: {min_dim}")

# Or let the model determine it automatically
model = GaussianRandomProjection(n_components="auto", eps=0.1)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment