Principle:Scikit learn Scikit learn Manifold Learning

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Unsupervised Learning, Dimensionality Reduction
Last Updated	2026-02-08 15:00 GMT

Overview

Manifold learning discovers the low-dimensional structure underlying high-dimensional data by preserving local or global geometric properties of the data manifold.

Description

Manifold learning methods are non-linear dimensionality reduction techniques based on the assumption that high-dimensional data often lies on or near a lower-dimensional manifold embedded in the ambient space. Unlike linear methods such as PCA, manifold learning can uncover curved, non-linear structures. These techniques solve the problem of visualizing and understanding the intrinsic geometry of complex datasets where linear projections fail to capture meaningful structure. They sit within unsupervised learning and are primarily used for exploratory data analysis and visualization, though some methods also produce embeddings useful for downstream tasks.

Usage

Use t-SNE for visualizing high-dimensional data in 2D or 3D, particularly when local cluster structure is important -- it excels at preserving neighborhood relationships. Use Isomap when the data lies on a globally smooth manifold and geodesic distances should be preserved. Use Locally Linear Embedding (LLE) when local linearity is a reasonable assumption and computational efficiency matters. Use MDS (Multidimensional Scaling) when you want to preserve pairwise distances in the low-dimensional embedding. Use Spectral Embedding for graph-structured data or when the data's connectivity structure is more meaningful than its Euclidean geometry.

Theoretical Basis

t-SNE (t-distributed Stochastic Neighbor Embedding):

Compute pairwise affinities in high-dimensional space using a Gaussian kernel:
$p_{j | i} = \frac{\exp (- ‖ x_{i} - x_{j} ‖^{2} / 2 σ_{i}^{2})}{\sum_{k \neq i} \exp (- ‖ x_{i} - x_{k} ‖^{2} / 2 σ_{i}^{2})}$

Symmetrize: $p_{i j} = (p_{j | i} + p_{i | j}) / 2 n$
Define affinities in low-dimensional space using a Student-t distribution with one degree of freedom:
$q_{i j} = \frac{(1 + ‖ y_{i} - y_{j} ‖^{2})^{- 1}}{\sum_{k \neq l} (1 + ‖ y_{k} - y_{l} ‖^{2})^{- 1}}$
Minimize the KL divergence: $K L (P ‖ Q) = \sum_{i \neq j} p_{i j} \log \frac{p_{i j}}{q_{i j}}$

The heavy-tailed Student-t distribution in the low-dimensional space alleviates the crowding problem that afflicts Gaussian-based methods.

Isomap preserves geodesic distances:

Construct a nearest neighbor graph.
Compute shortest-path (geodesic) distances between all pairs using Dijkstra's or Floyd-Warshall algorithm.
Apply classical MDS to the geodesic distance matrix.

Locally Linear Embedding (LLE):

Express each point as a weighted linear combination of its $k$ nearest neighbors by minimizing:
$\sum_{i} ‖ x_{i} - \sum_{j} W_{i j} x_{j} ‖^{2}$
Find the low-dimensional embedding $Y$ that best preserves these weights:
$\min_{Y} \sum_{i} ‖ y_{i} - \sum_{j} W_{i j} y_{j} ‖^{2}$
This reduces to an eigenvalue problem on the matrix $M = (I - W)^{T} (I - W)$ .

Multidimensional Scaling (MDS):

Classical MDS: Finds an embedding that preserves inner products derived from the distance matrix via double centering. Equivalent to PCA when distances are Euclidean.
Metric MDS (Stress minimization): Minimizes:

$Stress = \sqrt{\sum_{i < j} (d_{i j} - ‖ y_{i} - y_{j} ‖)^{2}}$

Spectral Embedding:

Construct an affinity (similarity) graph from the data.
Compute the graph Laplacian $L = D - W$ (or its normalized variant).
Embed using the eigenvectors corresponding to the smallest non-zero eigenvalues of $L$ .

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment