Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Scikit learn Scikit learn Dimensionality Reduction

From Leeroopedia


Knowledge Sources
Domains Unsupervised Learning, Feature Engineering
Last Updated 2026-02-08 15:00 GMT

Overview

Dimensionality reduction transforms high-dimensional data into a lower-dimensional representation while preserving as much meaningful structure as possible.

Description

Dimensionality reduction techniques address the curse of dimensionality by projecting data from a high-dimensional feature space into a lower-dimensional subspace. They solve the problems of computational cost, overfitting, and difficulty of visualization that arise when working with many features. These methods can be broadly categorized into linear techniques (which find linear projections) and non-linear techniques (which capture more complex structure). Dimensionality reduction sits within both unsupervised learning and feature engineering pipelines.

Usage

Use dimensionality reduction when the number of features is large relative to the number of samples, when you need to visualize high-dimensional data, or when downstream models suffer from overfitting due to excessive features. PCA is the default first choice for general-purpose linear reduction. Use NMF when data is non-negative and parts-based decomposition is meaningful (e.g., topic modeling, image decomposition). Use ICA when the goal is to recover statistically independent source signals. Use Truncated SVD for sparse data (e.g., text corpora) where centering is impractical.

Theoretical Basis

Principal Component Analysis (PCA) finds orthogonal directions of maximum variance. Given centered data matrix X, PCA computes the eigendecomposition of the covariance matrix:

Σ=1n1XTX=VΛVT

The top k eigenvectors form the projection matrix, and the projected data is Z=XVk. The fraction of variance retained is i=1kλi/i=1dλi.

Singular Value Decomposition (SVD) decomposes X=UΣVT. Truncated SVD retains only the top k singular values, yielding the best rank-k approximation in the Frobenius norm. This is especially useful for sparse matrices since it does not require centering.

Non-negative Matrix Factorization (NMF) approximates XWH where X,W,H0. The non-negativity constraint produces additive, parts-based representations. The objective minimizes:

XWHF2

subject to W0,H0.

Independent Component Analysis (ICA) assumes the observed data is a linear mixture of statistically independent sources: X=AS. FastICA recovers the unmixing matrix W such that S=WX by maximizing the non-Gaussianity of the recovered components.

Incremental PCA processes data in mini-batches, enabling PCA on datasets that do not fit in memory.

Kernel PCA applies the kernel trick to perform PCA in a high-dimensional feature space implicitly defined by a kernel function k(xi,xj), capturing non-linear structure.

Sparse PCA adds an 1 penalty to the components to produce sparse loadings, yielding more interpretable principal components.

Dictionary Learning finds a sparse representation of data in terms of an overcomplete basis (dictionary), minimizing XDAF2+αA1.

Factor Analysis models observed variables as linear combinations of latent factors plus Gaussian noise with a diagonal covariance, distinguishing shared variance from variable-specific variance.

Latent Dirichlet Allocation (LDA) is a generative probabilistic model for collections of discrete data (e.g., text corpora) that represents each document as a mixture of topics and each topic as a distribution over words.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment