Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Rapidsai Cuml PCA UMAP TSNE Configuration

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, GPU_Computing
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for configuring PCA, UMAP, and t-SNE dimensionality reduction algorithms provided by the cuML library.

Description

These constructors initialize the three primary dimensionality reduction estimators in cuML. Each constructor accepts algorithm-specific hyperparameters that control the behavior of the reduction:

  • PCA.__init__ configures the number of components, SVD solver strategy (full eigendecomposition vs. iterative Jacobi), whitening, and convergence tolerance.
  • UMAP.__init__ configures the neighborhood graph construction (n_neighbors, metric, build algorithm), embedding optimization (n_epochs, learning_rate, min_dist, spread), and initialization strategy.
  • TSNE.__init__ configures the probability distribution parameters (perplexity, early/late exaggeration), the approximation method (exact, Barnes-Hut, or FFT), and gradient descent parameters (learning_rate, momentum).

Usage

Import and instantiate these classes when setting up a dimensionality reduction pipeline. Choose the class based on whether you need linear (PCA) or nonlinear (UMAP, t-SNE) reduction, then configure the hyperparameters for your specific dataset and goals.

Code Reference

PCA.__init__

Source Location

  • Repository: cuML
  • File: python/cuml/cuml/decomposition/pca.pyx
  • Lines: 323-341

Signature

def __init__(
    self,
    *,
    copy=True,
    iterated_power=15,
    n_components=None,
    svd_solver='auto',
    tol=1e-7,
    verbose=False,
    whiten=False,
    output_type=None,
):

Import

from cuml import PCA
# or
from cuml.decomposition import PCA

UMAP.__init__

Source Location

  • Repository: cuML
  • File: python/cuml/cuml/manifold/umap/umap.pyx
  • Lines: 1052-1111

Signature

def __init__(
    self,
    *,
    n_neighbors=15,
    n_components=2,
    metric="euclidean",
    metric_kwds=None,
    n_epochs=None,
    learning_rate=1.0,
    min_dist=0.1,
    spread=1.0,
    set_op_mix_ratio=1.0,
    local_connectivity=1.0,
    repulsion_strength=1.0,
    negative_sample_rate=5,
    transform_queue_size=4.0,
    init="spectral",
    a=None,
    b=None,
    target_n_neighbors=-1,
    target_weight=0.5,
    target_metric="categorical",
    hash_input=False,
    random_state=None,
    precomputed_knn=None,
    callback=None,
    build_algo="auto",
    build_kwds=None,
    device_ids=None,
    verbose=False,
    output_type=None,
):

Import

from cuml import UMAP
# or
from cuml.manifold import UMAP

TSNE.__init__

Source Location

  • Repository: cuML
  • File: python/cuml/cuml/manifold/t_sne.pyx
  • Lines: 507-557

Signature

def __init__(
    self,
    *,
    n_components=2,
    perplexity=30.0,
    early_exaggeration=12.0,
    late_exaggeration=1.0,
    learning_rate=200.0,
    max_iter=1000,
    n_iter_without_progress=300,
    min_grad_norm=1e-07,
    metric='euclidean',
    metric_params=None,
    init='random',
    random_state=None,
    method='fft',
    angle=0.5,
    n_neighbors=90,
    perplexity_max_iter=100,
    exaggeration_iter=250,
    pre_momentum=0.5,
    post_momentum=0.8,
    learning_rate_method='adaptive',
    square_distances=True,
    precomputed_knn=None,
    verbose=False,
    output_type=None,
):

Import

from cuml import TSNE
# or
from cuml.manifold import TSNE

I/O Contract

PCA Inputs

Name Type Required Description
copy bool No (default True) If True, copies data then removes mean. False may overwrite input with mean-centered version.
iterated_power int No (default 15) Number of iterations for the Jacobi solver. More iterations yield higher accuracy at slower speed.
n_components int or None No (default None) Number of top K singular vectors to keep. If None, keeps min(n_samples, n_features).
svd_solver str No (default 'auto') One of 'full', 'jacobi', or 'auto'. 'full' uses eigendecomposition; 'jacobi' is iterative and faster but less accurate.
tol float No (default 1e-7) Convergence tolerance for Jacobi solver. Smaller values increase accuracy but slow convergence.
verbose int or bool No (default False) Sets logging level.
whiten bool No (default False) If True, divides components by singular values and multiplies by sqrt(n_samples) for unit variance.
output_type str or None No (default None) Output data type format ('array', 'dataframe', 'cupy', 'numpy', etc.).

UMAP Inputs

Name Type Required Description
n_neighbors float No (default 15) Size of local neighborhood for manifold approximation. Range 2-100.
n_components int No (default 2) Dimension of the target embedding space.
metric str No (default 'euclidean') Distance metric. Supports 'euclidean', 'manhattan', 'cosine', 'correlation', 'chebyshev', 'minkowski', 'hamming', 'jaccard', and others.
metric_kwds dict or None No (default None) Arguments for parameterized metrics (e.g., Minkowski p).
n_epochs int or None No (default None) Number of training epochs. None selects automatically (200 for large, 500 for small datasets).
learning_rate float No (default 1.0) Initial learning rate for embedding optimization.
min_dist float No (default 0.1) Minimum distance between embedded points. Smaller values produce tighter clusters.
spread float No (default 1.0) Effective scale of embedded points.
init str No (default 'spectral') Initialization method: 'spectral', 'random', or an array-like of initial positions.
build_algo str No (default 'auto') KNN build algorithm: 'auto', 'brute_force_knn', or 'nn_descent'.
random_state int or None No (default None) Seed for reproducible embeddings.
hash_input bool No (default False) Hash training input to return exact embeddings on transform of same data.

TSNE Inputs

Name Type Required Description
n_components int No (default 2) Output dimensionality. Currently only 2 is supported.
perplexity float No (default 30.0) Related to number of nearest neighbors. Larger values for larger datasets. Range 5-50.
early_exaggeration float No (default 12.0) Controls space between clusters during early optimization.
late_exaggeration float No (default 1.0) Controls cluster separation after exaggeration_iter iterations (FFT only).
learning_rate float No (default 200.0) Learning rate, typically between 10 and 1000.
max_iter int No (default 1000) Maximum number of optimization iterations.
method str No (default 'fft') Algorithm: 'fft' (fast), 'barnes_hut' (fast approximation), or 'exact' (accurate but slow).
angle float No (default 0.5) Speed/accuracy trade-off for Barnes-Hut. Range 0.0-1.0.
metric str No (default 'euclidean') Distance metric. Supports 'euclidean', 'manhattan', 'cosine', 'correlation', 'chebyshev', 'minkowski', 'sqeuclidean'.
init str No (default 'random') Initialization: 'random' or 'pca'.
random_state int or None No (default None) Seed for initialization. Note: results are not fully deterministic.

Outputs

Name Type Description
PCA instance PCA Configured PCA estimator ready for fitting.
UMAP instance UMAP Configured UMAP estimator ready for fitting.
TSNE instance TSNE Configured TSNE estimator ready for fitting.

Usage Examples

PCA Configuration

from cuml.decomposition import PCA

# Basic PCA with 3 components
pca = PCA(n_components=3)

# PCA with Jacobi solver for faster computation
pca_fast = PCA(n_components=50, svd_solver='jacobi', iterated_power=20, tol=1e-5)

# PCA with whitening for downstream linear models
pca_white = PCA(n_components=10, whiten=True)

UMAP Configuration

from cuml.manifold import UMAP

# Basic 2D visualization
umap = UMAP(n_components=2, n_neighbors=15, min_dist=0.1)

# Tighter clusters with more neighbors
umap_tight = UMAP(n_neighbors=50, min_dist=0.01, spread=1.0, n_epochs=500)

# Reproducible embedding with NN Descent for large data
umap_repro = UMAP(random_state=42, build_algo='nn_descent')

TSNE Configuration

from cuml.manifold import TSNE

# Basic 2D t-SNE with FFT approximation
tsne = TSNE(n_components=2, method='fft')

# Higher perplexity for larger datasets
tsne_large = TSNE(perplexity=50.0, learning_rate=500.0, max_iter=2000)

# Exact algorithm for small datasets
tsne_exact = TSNE(method='exact', perplexity=15.0, random_state=42)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment