Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn TSNE

From Leeroopedia


Knowledge Sources
Domains Machine Learning, Dimensionality Reduction
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for T-distributed Stochastic Neighbor Embedding (t-SNE) visualization provided by scikit-learn.

Description

This module implements the t-SNE algorithm for visualizing high-dimensional data in 2D or 3D space. t-SNE converts pairwise similarities between data points into joint probability distributions and minimizes the KL divergence between the high-dimensional and low-dimensional distributions. The module supports both the exact O(N^2) algorithm and the Barnes-Hut O(N log N) approximation. It includes helper functions for computing joint probabilities, KL divergence gradients, and integration with PCA for initialization.

Usage

Use t-SNE primarily for data visualization and exploratory analysis of high-dimensional datasets. It excels at revealing local structure and clusters. Note that t-SNE is non-parametric, meaning it cannot transform new data points without refitting.

Code Reference

Source Location

Signature

class TSNE(ClassNamePrefixFeaturesOutMixin, TransformerMixin, BaseEstimator):
    """T-distributed Stochastic Neighbor Embedding."""

    def __init__(
        self,
        n_components=2,
        *,
        perplexity=30.0,
        early_exaggeration=12.0,
        learning_rate="auto",
        max_iter=None,
        n_iter_without_progress=300,
        min_grad_norm=1e-7,
        metric="euclidean",
        metric_params=None,
        init="pca",
        verbose=0,
        random_state=None,
        method="barnes_hut",
        angle=0.5,
        n_jobs=None,
    ):
        ...

Import

from sklearn.manifold import TSNE

I/O Contract

Inputs

Name Type Required Description
X array-like of shape (n_samples, n_features) or (n_samples, n_samples) Yes Data or precomputed distance matrix
n_components int No Target dimensionality, typically 2 or 3 (default: 2)
perplexity float No Related to number of nearest neighbors; larger datasets need larger perplexity (default: 30.0)
early_exaggeration float No Controls cluster tightness in early iterations (default: 12.0)
learning_rate float or str No Learning rate for optimization; 'auto' recommended (default: 'auto')
method str No Algorithm: 'barnes_hut' (O(N log N)) or 'exact' (O(N^2)) (default: 'barnes_hut')
metric str No Distance metric (default: 'euclidean')
init str or ndarray No Initialization: 'pca', 'random', or array (default: 'pca')
random_state int or None No Random state for reproducibility (default: None)

Outputs

Name Type Description
embedding_ ndarray of shape (n_samples, n_components) Low-dimensional embedding
kl_divergence_ float Final KL divergence between distributions
n_features_in_ int Number of features in input data
n_iter_ int Number of iterations run

Usage Examples

Basic Usage

from sklearn.manifold import TSNE
from sklearn.datasets import load_digits

# Load digits dataset
X, y = load_digits(return_X_y=True)

# Apply t-SNE
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
X_embedded = tsne.fit_transform(X)
print("Embedded shape:", X_embedded.shape)
print("KL divergence:", tsne.kl_divergence_)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment