Implementation:Rapidsai Cuml PCA UMAP TSNE Configuration

Knowledge Sources	cuML cuML Docs
Domains	Machine_Learning, GPU_Computing
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for configuring PCA, UMAP, and t-SNE dimensionality reduction algorithms provided by the cuML library.

Description

These constructors initialize the three primary dimensionality reduction estimators in cuML. Each constructor accepts algorithm-specific hyperparameters that control the behavior of the reduction:

PCA.__init__ configures the number of components, SVD solver strategy (full eigendecomposition vs. iterative Jacobi), whitening, and convergence tolerance.
UMAP.__init__ configures the neighborhood graph construction (n_neighbors, metric, build algorithm), embedding optimization (n_epochs, learning_rate, min_dist, spread), and initialization strategy.
TSNE.__init__ configures the probability distribution parameters (perplexity, early/late exaggeration), the approximation method (exact, Barnes-Hut, or FFT), and gradient descent parameters (learning_rate, momentum).

Usage

Import and instantiate these classes when setting up a dimensionality reduction pipeline. Choose the class based on whether you need linear (PCA) or nonlinear (UMAP, t-SNE) reduction, then configure the hyperparameters for your specific dataset and goals.

Code Reference

PCA.init

Source Location

Repository: cuML
File: python/cuml/cuml/decomposition/pca.pyx
Lines: 323-341

Signature

def __init__(
    self,
    *,
    copy=True,
    iterated_power=15,
    n_components=None,
    svd_solver='auto',
    tol=1e-7,
    verbose=False,
    whiten=False,
    output_type=None,
):

Import

from cuml import PCA
# or
from cuml.decomposition import PCA

UMAP.init

Source Location

Repository: cuML
File: python/cuml/cuml/manifold/umap/umap.pyx
Lines: 1052-1111

Signature

def __init__(
    self,
    *,
    n_neighbors=15,
    n_components=2,
    metric="euclidean",
    metric_kwds=None,
    n_epochs=None,
    learning_rate=1.0,
    min_dist=0.1,
    spread=1.0,
    set_op_mix_ratio=1.0,
    local_connectivity=1.0,
    repulsion_strength=1.0,
    negative_sample_rate=5,
    transform_queue_size=4.0,
    init="spectral",
    a=None,
    b=None,
    target_n_neighbors=-1,
    target_weight=0.5,
    target_metric="categorical",
    hash_input=False,
    random_state=None,
    precomputed_knn=None,
    callback=None,
    build_algo="auto",
    build_kwds=None,
    device_ids=None,
    verbose=False,
    output_type=None,
):

Import

from cuml import UMAP
# or
from cuml.manifold import UMAP

TSNE.init

Source Location

Repository: cuML
File: python/cuml/cuml/manifold/t_sne.pyx
Lines: 507-557

Signature

def __init__(
    self,
    *,
    n_components=2,
    perplexity=30.0,
    early_exaggeration=12.0,
    late_exaggeration=1.0,
    learning_rate=200.0,
    max_iter=1000,
    n_iter_without_progress=300,
    min_grad_norm=1e-07,
    metric='euclidean',
    metric_params=None,
    init='random',
    random_state=None,
    method='fft',
    angle=0.5,
    n_neighbors=90,
    perplexity_max_iter=100,
    exaggeration_iter=250,
    pre_momentum=0.5,
    post_momentum=0.8,
    learning_rate_method='adaptive',
    square_distances=True,
    precomputed_knn=None,
    verbose=False,
    output_type=None,
):

Import

from cuml import TSNE
# or
from cuml.manifold import TSNE

I/O Contract

PCA Inputs

Name	Type	Required	Description
copy	bool	No (default True)	If True, copies data then removes mean. False may overwrite input with mean-centered version.
iterated_power	int	No (default 15)	Number of iterations for the Jacobi solver. More iterations yield higher accuracy at slower speed.
n_components	int or None	No (default None)	Number of top K singular vectors to keep. If None, keeps min(n_samples, n_features).
svd_solver	str	No (default 'auto')	One of 'full', 'jacobi', or 'auto'. 'full' uses eigendecomposition; 'jacobi' is iterative and faster but less accurate.
tol	float	No (default 1e-7)	Convergence tolerance for Jacobi solver. Smaller values increase accuracy but slow convergence.
verbose	int or bool	No (default False)	Sets logging level.
whiten	bool	No (default False)	If True, divides components by singular values and multiplies by sqrt(n_samples) for unit variance.
output_type	str or None	No (default None)	Output data type format ('array', 'dataframe', 'cupy', 'numpy', etc.).

UMAP Inputs

Name	Type	Required	Description
n_neighbors	float	No (default 15)	Size of local neighborhood for manifold approximation. Range 2-100.
n_components	int	No (default 2)	Dimension of the target embedding space.
metric	str	No (default 'euclidean')	Distance metric. Supports 'euclidean', 'manhattan', 'cosine', 'correlation', 'chebyshev', 'minkowski', 'hamming', 'jaccard', and others.
metric_kwds	dict or None	No (default None)	Arguments for parameterized metrics (e.g., Minkowski p).
n_epochs	int or None	No (default None)	Number of training epochs. None selects automatically (200 for large, 500 for small datasets).
learning_rate	float	No (default 1.0)	Initial learning rate for embedding optimization.
min_dist	float	No (default 0.1)	Minimum distance between embedded points. Smaller values produce tighter clusters.
spread	float	No (default 1.0)	Effective scale of embedded points.
init	str	No (default 'spectral')	Initialization method: 'spectral', 'random', or an array-like of initial positions.
build_algo	str	No (default 'auto')	KNN build algorithm: 'auto', 'brute_force_knn', or 'nn_descent'.
random_state	int or None	No (default None)	Seed for reproducible embeddings.
hash_input	bool	No (default False)	Hash training input to return exact embeddings on transform of same data.

TSNE Inputs

Name	Type	Required	Description
n_components	int	No (default 2)	Output dimensionality. Currently only 2 is supported.
perplexity	float	No (default 30.0)	Related to number of nearest neighbors. Larger values for larger datasets. Range 5-50.
early_exaggeration	float	No (default 12.0)	Controls space between clusters during early optimization.
late_exaggeration	float	No (default 1.0)	Controls cluster separation after exaggeration_iter iterations (FFT only).
learning_rate	float	No (default 200.0)	Learning rate, typically between 10 and 1000.
max_iter	int	No (default 1000)	Maximum number of optimization iterations.
method	str	No (default 'fft')	Algorithm: 'fft' (fast), 'barnes_hut' (fast approximation), or 'exact' (accurate but slow).
angle	float	No (default 0.5)	Speed/accuracy trade-off for Barnes-Hut. Range 0.0-1.0.
metric	str	No (default 'euclidean')	Distance metric. Supports 'euclidean', 'manhattan', 'cosine', 'correlation', 'chebyshev', 'minkowski', 'sqeuclidean'.
init	str	No (default 'random')	Initialization: 'random' or 'pca'.
random_state	int or None	No (default None)	Seed for initialization. Note: results are not fully deterministic.

Outputs

Name	Type	Description
PCA instance	PCA	Configured PCA estimator ready for fitting.
UMAP instance	UMAP	Configured UMAP estimator ready for fitting.
TSNE instance	TSNE	Configured TSNE estimator ready for fitting.

Usage Examples

PCA Configuration

from cuml.decomposition import PCA

# Basic PCA with 3 components
pca = PCA(n_components=3)

# PCA with Jacobi solver for faster computation
pca_fast = PCA(n_components=50, svd_solver='jacobi', iterated_power=20, tol=1e-5)

# PCA with whitening for downstream linear models
pca_white = PCA(n_components=10, whiten=True)

UMAP Configuration

from cuml.manifold import UMAP

# Basic 2D visualization
umap = UMAP(n_components=2, n_neighbors=15, min_dist=0.1)

# Tighter clusters with more neighbors
umap_tight = UMAP(n_neighbors=50, min_dist=0.01, spread=1.0, n_epochs=500)

# Reproducible embedding with NN Descent for large data
umap_repro = UMAP(random_state=42, build_algo='nn_descent')

TSNE Configuration

from cuml.manifold import TSNE

# Basic 2D t-SNE with FFT approximation
tsne = TSNE(n_components=2, method='fft')

# Higher perplexity for larger datasets
tsne_large = TSNE(perplexity=50.0, learning_rate=500.0, max_iter=2000)

# Exact algorithm for small datasets
tsne_exact = TSNE(method='exact', perplexity=15.0, random_state=42)

Related Pages

Implements Principle

Principle:Rapidsai_Cuml_Dimensionality_Reduction_Algorithm_Selection

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment

Overview

Description

Usage

Code Reference

PCA.__init__

Source Location

Signature

Import

UMAP.__init__

Source Location

Signature

Import

TSNE.__init__

Source Location

Signature

Import

I/O Contract

PCA Inputs

UMAP Inputs

TSNE Inputs

Outputs

Usage Examples

PCA Configuration

UMAP Configuration

TSNE Configuration

Related Pages

Implements Principle

Requires Environment

Page Connections

PCA.init

UMAP.init

TSNE.init