Implementation:Scikit learn Scikit learn TSNE

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Machine Learning, Dimensionality Reduction
Last Updated	2026-02-08 15:00 GMT

Overview

Concrete tool for T-distributed Stochastic Neighbor Embedding (t-SNE) visualization provided by scikit-learn.

Description

This module implements the t-SNE algorithm for visualizing high-dimensional data in 2D or 3D space. t-SNE converts pairwise similarities between data points into joint probability distributions and minimizes the KL divergence between the high-dimensional and low-dimensional distributions. The module supports both the exact O(N^2) algorithm and the Barnes-Hut O(N log N) approximation. It includes helper functions for computing joint probabilities, KL divergence gradients, and integration with PCA for initialization.

Usage

Use t-SNE primarily for data visualization and exploratory analysis of high-dimensional datasets. It excels at revealing local structure and clusters. Note that t-SNE is non-parametric, meaning it cannot transform new data points without refitting.

Code Reference

Source Location

Repository: scikit-learn
File: sklearn/manifold/_t_sne.py

Signature

class TSNE(ClassNamePrefixFeaturesOutMixin, TransformerMixin, BaseEstimator):
    """T-distributed Stochastic Neighbor Embedding."""

    def __init__(
        self,
        n_components=2,
        *,
        perplexity=30.0,
        early_exaggeration=12.0,
        learning_rate="auto",
        max_iter=None,
        n_iter_without_progress=300,
        min_grad_norm=1e-7,
        metric="euclidean",
        metric_params=None,
        init="pca",
        verbose=0,
        random_state=None,
        method="barnes_hut",
        angle=0.5,
        n_jobs=None,
    ):
        ...

Import

from sklearn.manifold import TSNE

I/O Contract

Inputs

Name	Type	Required	Description
X	array-like of shape (n_samples, n_features) or (n_samples, n_samples)	Yes	Data or precomputed distance matrix
n_components	int	No	Target dimensionality, typically 2 or 3 (default: 2)
perplexity	float	No	Related to number of nearest neighbors; larger datasets need larger perplexity (default: 30.0)
early_exaggeration	float	No	Controls cluster tightness in early iterations (default: 12.0)
learning_rate	float or str	No	Learning rate for optimization; 'auto' recommended (default: 'auto')
method	str	No	Algorithm: 'barnes_hut' (O(N log N)) or 'exact' (O(N^2)) (default: 'barnes_hut')
metric	str	No	Distance metric (default: 'euclidean')
init	str or ndarray	No	Initialization: 'pca', 'random', or array (default: 'pca')
random_state	int or None	No	Random state for reproducibility (default: None)

Outputs

Name	Type	Description
embedding_	ndarray of shape (n_samples, n_components)	Low-dimensional embedding
kl_divergence_	float	Final KL divergence between distributions
n_features_in_	int	Number of features in input data
n_iter_	int	Number of iterations run

Usage Examples

Basic Usage

from sklearn.manifold import TSNE
from sklearn.datasets import load_digits

# Load digits dataset
X, y = load_digits(return_X_y=True)

# Apply t-SNE
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
X_embedded = tsne.fit_transform(X)
print("Embedded shape:", X_embedded.shape)
print("KL divergence:", tsne.kl_divergence_)

Related Pages

Principle:Scikit_learn_Scikit_learn_Manifold_Learning

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment