Implementation:Scikit learn Scikit learn TSNE
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Dimensionality Reduction |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for T-distributed Stochastic Neighbor Embedding (t-SNE) visualization provided by scikit-learn.
Description
This module implements the t-SNE algorithm for visualizing high-dimensional data in 2D or 3D space. t-SNE converts pairwise similarities between data points into joint probability distributions and minimizes the KL divergence between the high-dimensional and low-dimensional distributions. The module supports both the exact O(N^2) algorithm and the Barnes-Hut O(N log N) approximation. It includes helper functions for computing joint probabilities, KL divergence gradients, and integration with PCA for initialization.
Usage
Use t-SNE primarily for data visualization and exploratory analysis of high-dimensional datasets. It excels at revealing local structure and clusters. Note that t-SNE is non-parametric, meaning it cannot transform new data points without refitting.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/manifold/_t_sne.py
Signature
class TSNE(ClassNamePrefixFeaturesOutMixin, TransformerMixin, BaseEstimator):
"""T-distributed Stochastic Neighbor Embedding."""
def __init__(
self,
n_components=2,
*,
perplexity=30.0,
early_exaggeration=12.0,
learning_rate="auto",
max_iter=None,
n_iter_without_progress=300,
min_grad_norm=1e-7,
metric="euclidean",
metric_params=None,
init="pca",
verbose=0,
random_state=None,
method="barnes_hut",
angle=0.5,
n_jobs=None,
):
...
Import
from sklearn.manifold import TSNE
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| X | array-like of shape (n_samples, n_features) or (n_samples, n_samples) | Yes | Data or precomputed distance matrix |
| n_components | int | No | Target dimensionality, typically 2 or 3 (default: 2) |
| perplexity | float | No | Related to number of nearest neighbors; larger datasets need larger perplexity (default: 30.0) |
| early_exaggeration | float | No | Controls cluster tightness in early iterations (default: 12.0) |
| learning_rate | float or str | No | Learning rate for optimization; 'auto' recommended (default: 'auto') |
| method | str | No | Algorithm: 'barnes_hut' (O(N log N)) or 'exact' (O(N^2)) (default: 'barnes_hut') |
| metric | str | No | Distance metric (default: 'euclidean') |
| init | str or ndarray | No | Initialization: 'pca', 'random', or array (default: 'pca') |
| random_state | int or None | No | Random state for reproducibility (default: None) |
Outputs
| Name | Type | Description |
|---|---|---|
| embedding_ | ndarray of shape (n_samples, n_components) | Low-dimensional embedding |
| kl_divergence_ | float | Final KL divergence between distributions |
| n_features_in_ | int | Number of features in input data |
| n_iter_ | int | Number of iterations run |
Usage Examples
Basic Usage
from sklearn.manifold import TSNE
from sklearn.datasets import load_digits
# Load digits dataset
X, y = load_digits(return_X_y=True)
# Apply t-SNE
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
X_embedded = tsne.fit_transform(X)
print("Embedded shape:", X_embedded.shape)
print("KL divergence:", tsne.kl_divergence_)