Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Rapidsai Cuml RandomProjection

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Dimensionality_Reduction
Last Updated 2026-02-08 12:00 GMT

Overview

The random_projection module provides GPU-accelerated implementations of Gaussian and Sparse random projection for dimensionality reduction, along with the Johnson-Lindenstrauss minimum dimension utility function.

Description

This module contains three main components:

johnson_lindenstrauss_min_dim(n_samples, eps) -- A utility function that computes the minimum number of random projection components needed to guarantee that pairwise distances are preserved within an eps error tolerance, based on the Johnson-Lindenstrauss lemma.

GaussianRandomProjection -- Reduces dimensionality by multiplying input data with a dense random matrix whose components are drawn from N(0, 1/n_components). It extends _BaseRandomProjection and generates a dense Gaussian random matrix at fit time.

SparseRandomProjection -- Reduces dimensionality through a sparse random projection matrix, providing similar embedding quality as Gaussian random projection while being much more memory efficient. The sparse matrix uses a three-value distribution (-sqrt(s/n_components), 0, +sqrt(s/n_components)) where s = 1/density. The density parameter controls sparsity and defaults to 1/sqrt(n_features).

Both projection classes support automatic component count selection via the Johnson-Lindenstrauss lemma when n_components='auto'. They accept both dense and sparse input matrices and delegate to cuML's internal array handling for GPU computation.

Usage

Use GaussianRandomProjection or SparseRandomProjection when you need fast, approximate dimensionality reduction that preserves pairwise distances. SparseRandomProjection is preferred when memory efficiency matters or when the projection matrix needs to be sparse. These methods are useful as preprocessing steps for downstream ML algorithms when the original feature space is very high-dimensional.

Code Reference

Source Location

  • Repository: Rapidsai_Cuml
  • File: python/cuml/cuml/random_projection/random_projection.py

Signature

def johnson_lindenstrauss_min_dim(n_samples, eps=0.1)

class GaussianRandomProjection(_BaseRandomProjection):
    def __init__(
        self,
        n_components="auto",
        *,
        eps=0.1,
        random_state=None,
        output_type=None,
        verbose=False,
    )

class SparseRandomProjection(_BaseRandomProjection):
    def __init__(
        self,
        n_components="auto",
        *,
        density="auto",
        eps=0.1,
        dense_output=False,
        random_state=None,
        output_type=None,
        verbose=False,
    )

Import

from cuml.random_projection import GaussianRandomProjection
from cuml.random_projection import SparseRandomProjection
from cuml.random_projection.random_projection import johnson_lindenstrauss_min_dim

I/O Contract

Inputs (GaussianRandomProjection)

Name Type Required Description
n_components int or 'auto' No Target projection dimensionality. If 'auto', determined by Johnson-Lindenstrauss lemma. Default is 'auto'.
eps float No Maximum distortion rate for JL lemma when n_components='auto'. Must be in (0, 1). Default is 0.1.
random_state int, RandomState, or None No Controls random number generation for the projection matrix. Default is None.
output_type str or None No Return results in the indicated output type.
verbose int or bool No Sets logging level. Default is False.

Inputs (SparseRandomProjection)

Name Type Required Description
n_components int or 'auto' No Target projection dimensionality. If 'auto', determined by Johnson-Lindenstrauss lemma. Default is 'auto'.
density float or 'auto' No Ratio of non-zero components in the projection matrix (0, 1]. If 'auto', set to 1/sqrt(n_features). Default is 'auto'.
eps float No Maximum distortion rate for JL lemma when n_components='auto'. Default is 0.1.
dense_output bool No If True, output is always dense even for sparse inputs. Default is False.
random_state int, RandomState, or None No Controls random number generation. Default is None.
output_type str or None No Return results in the indicated output type.
verbose int or bool No Sets logging level. Default is False.

Outputs

Name Type Description
n_components_ int Concrete number of components computed (relevant when n_components='auto').
components_ array or sparse matrix (n_components, n_features) Random matrix used for the projection.
density_ float (SparseRandomProjection only) Concrete density value computed from 'auto'.
n_features_in_ int Number of features seen during fit.

Usage Examples

Gaussian Random Projection

from cuml.random_projection import GaussianRandomProjection
from cuml.datasets import make_blobs

# Create high-dimensional data
X, _ = make_blobs(n_samples=200, n_features=1000, random_state=42)

# Project to 50 dimensions
model = GaussianRandomProjection(n_components=50, random_state=42)
X_new = model.fit_transform(X)
print(X_new.shape)  # (200, 50)

Sparse Random Projection

from cuml.random_projection import SparseRandomProjection
from cuml.datasets import make_blobs

# Create high-dimensional data
X, _ = make_blobs(n_samples=200, n_features=1000, random_state=42)

# Project using sparse random matrix
model = SparseRandomProjection(n_components=50, random_state=42)
X_new = model.fit_transform(X)
print(X_new.shape)  # (200, 50)

Auto Component Selection

from cuml.random_projection import GaussianRandomProjection
from cuml.random_projection.random_projection import johnson_lindenstrauss_min_dim

# Compute minimum dimension for 1000 samples with 10% distortion
min_dim = johnson_lindenstrauss_min_dim(n_samples=1000, eps=0.1)
print(f"Minimum components needed: {min_dim}")

# Or let the model determine it automatically
model = GaussianRandomProjection(n_components="auto", eps=0.1)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment