Implementation:Rapidsai Cuml PCA UMAP TSNE Configuration
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, GPU_Computing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for configuring PCA, UMAP, and t-SNE dimensionality reduction algorithms provided by the cuML library.
Description
These constructors initialize the three primary dimensionality reduction estimators in cuML. Each constructor accepts algorithm-specific hyperparameters that control the behavior of the reduction:
- PCA.__init__ configures the number of components, SVD solver strategy (full eigendecomposition vs. iterative Jacobi), whitening, and convergence tolerance.
- UMAP.__init__ configures the neighborhood graph construction (n_neighbors, metric, build algorithm), embedding optimization (n_epochs, learning_rate, min_dist, spread), and initialization strategy.
- TSNE.__init__ configures the probability distribution parameters (perplexity, early/late exaggeration), the approximation method (exact, Barnes-Hut, or FFT), and gradient descent parameters (learning_rate, momentum).
Usage
Import and instantiate these classes when setting up a dimensionality reduction pipeline. Choose the class based on whether you need linear (PCA) or nonlinear (UMAP, t-SNE) reduction, then configure the hyperparameters for your specific dataset and goals.
Code Reference
PCA.__init__
Source Location
- Repository: cuML
- File:
python/cuml/cuml/decomposition/pca.pyx - Lines: 323-341
Signature
def __init__(
self,
*,
copy=True,
iterated_power=15,
n_components=None,
svd_solver='auto',
tol=1e-7,
verbose=False,
whiten=False,
output_type=None,
):
Import
from cuml import PCA
# or
from cuml.decomposition import PCA
UMAP.__init__
Source Location
- Repository: cuML
- File:
python/cuml/cuml/manifold/umap/umap.pyx - Lines: 1052-1111
Signature
def __init__(
self,
*,
n_neighbors=15,
n_components=2,
metric="euclidean",
metric_kwds=None,
n_epochs=None,
learning_rate=1.0,
min_dist=0.1,
spread=1.0,
set_op_mix_ratio=1.0,
local_connectivity=1.0,
repulsion_strength=1.0,
negative_sample_rate=5,
transform_queue_size=4.0,
init="spectral",
a=None,
b=None,
target_n_neighbors=-1,
target_weight=0.5,
target_metric="categorical",
hash_input=False,
random_state=None,
precomputed_knn=None,
callback=None,
build_algo="auto",
build_kwds=None,
device_ids=None,
verbose=False,
output_type=None,
):
Import
from cuml import UMAP
# or
from cuml.manifold import UMAP
TSNE.__init__
Source Location
- Repository: cuML
- File:
python/cuml/cuml/manifold/t_sne.pyx - Lines: 507-557
Signature
def __init__(
self,
*,
n_components=2,
perplexity=30.0,
early_exaggeration=12.0,
late_exaggeration=1.0,
learning_rate=200.0,
max_iter=1000,
n_iter_without_progress=300,
min_grad_norm=1e-07,
metric='euclidean',
metric_params=None,
init='random',
random_state=None,
method='fft',
angle=0.5,
n_neighbors=90,
perplexity_max_iter=100,
exaggeration_iter=250,
pre_momentum=0.5,
post_momentum=0.8,
learning_rate_method='adaptive',
square_distances=True,
precomputed_knn=None,
verbose=False,
output_type=None,
):
Import
from cuml import TSNE
# or
from cuml.manifold import TSNE
I/O Contract
PCA Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| copy | bool | No (default True) | If True, copies data then removes mean. False may overwrite input with mean-centered version. |
| iterated_power | int | No (default 15) | Number of iterations for the Jacobi solver. More iterations yield higher accuracy at slower speed. |
| n_components | int or None | No (default None) | Number of top K singular vectors to keep. If None, keeps min(n_samples, n_features). |
| svd_solver | str | No (default 'auto') | One of 'full', 'jacobi', or 'auto'. 'full' uses eigendecomposition; 'jacobi' is iterative and faster but less accurate. |
| tol | float | No (default 1e-7) | Convergence tolerance for Jacobi solver. Smaller values increase accuracy but slow convergence. |
| verbose | int or bool | No (default False) | Sets logging level. |
| whiten | bool | No (default False) | If True, divides components by singular values and multiplies by sqrt(n_samples) for unit variance. |
| output_type | str or None | No (default None) | Output data type format ('array', 'dataframe', 'cupy', 'numpy', etc.). |
UMAP Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| n_neighbors | float | No (default 15) | Size of local neighborhood for manifold approximation. Range 2-100. |
| n_components | int | No (default 2) | Dimension of the target embedding space. |
| metric | str | No (default 'euclidean') | Distance metric. Supports 'euclidean', 'manhattan', 'cosine', 'correlation', 'chebyshev', 'minkowski', 'hamming', 'jaccard', and others. |
| metric_kwds | dict or None | No (default None) | Arguments for parameterized metrics (e.g., Minkowski p). |
| n_epochs | int or None | No (default None) | Number of training epochs. None selects automatically (200 for large, 500 for small datasets). |
| learning_rate | float | No (default 1.0) | Initial learning rate for embedding optimization. |
| min_dist | float | No (default 0.1) | Minimum distance between embedded points. Smaller values produce tighter clusters. |
| spread | float | No (default 1.0) | Effective scale of embedded points. |
| init | str | No (default 'spectral') | Initialization method: 'spectral', 'random', or an array-like of initial positions. |
| build_algo | str | No (default 'auto') | KNN build algorithm: 'auto', 'brute_force_knn', or 'nn_descent'. |
| random_state | int or None | No (default None) | Seed for reproducible embeddings. |
| hash_input | bool | No (default False) | Hash training input to return exact embeddings on transform of same data. |
TSNE Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| n_components | int | No (default 2) | Output dimensionality. Currently only 2 is supported. |
| perplexity | float | No (default 30.0) | Related to number of nearest neighbors. Larger values for larger datasets. Range 5-50. |
| early_exaggeration | float | No (default 12.0) | Controls space between clusters during early optimization. |
| late_exaggeration | float | No (default 1.0) | Controls cluster separation after exaggeration_iter iterations (FFT only). |
| learning_rate | float | No (default 200.0) | Learning rate, typically between 10 and 1000. |
| max_iter | int | No (default 1000) | Maximum number of optimization iterations. |
| method | str | No (default 'fft') | Algorithm: 'fft' (fast), 'barnes_hut' (fast approximation), or 'exact' (accurate but slow). |
| angle | float | No (default 0.5) | Speed/accuracy trade-off for Barnes-Hut. Range 0.0-1.0. |
| metric | str | No (default 'euclidean') | Distance metric. Supports 'euclidean', 'manhattan', 'cosine', 'correlation', 'chebyshev', 'minkowski', 'sqeuclidean'. |
| init | str | No (default 'random') | Initialization: 'random' or 'pca'. |
| random_state | int or None | No (default None) | Seed for initialization. Note: results are not fully deterministic. |
Outputs
| Name | Type | Description |
|---|---|---|
| PCA instance | PCA | Configured PCA estimator ready for fitting. |
| UMAP instance | UMAP | Configured UMAP estimator ready for fitting. |
| TSNE instance | TSNE | Configured TSNE estimator ready for fitting. |
Usage Examples
PCA Configuration
from cuml.decomposition import PCA
# Basic PCA with 3 components
pca = PCA(n_components=3)
# PCA with Jacobi solver for faster computation
pca_fast = PCA(n_components=50, svd_solver='jacobi', iterated_power=20, tol=1e-5)
# PCA with whitening for downstream linear models
pca_white = PCA(n_components=10, whiten=True)
UMAP Configuration
from cuml.manifold import UMAP
# Basic 2D visualization
umap = UMAP(n_components=2, n_neighbors=15, min_dist=0.1)
# Tighter clusters with more neighbors
umap_tight = UMAP(n_neighbors=50, min_dist=0.01, spread=1.0, n_epochs=500)
# Reproducible embedding with NN Descent for large data
umap_repro = UMAP(random_state=42, build_algo='nn_descent')
TSNE Configuration
from cuml.manifold import TSNE
# Basic 2D t-SNE with FFT approximation
tsne = TSNE(n_components=2, method='fft')
# Higher perplexity for larger datasets
tsne_large = TSNE(perplexity=50.0, learning_rate=500.0, max_iter=2000)
# Exact algorithm for small datasets
tsne_exact = TSNE(method='exact', perplexity=15.0, random_state=42)
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment