Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn TruncatedSVD

From Leeroopedia


Knowledge Sources
Domains Dimensionality Reduction, Natural Language Processing
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for dimensionality reduction using truncated SVD (also known as Latent Semantic Analysis) provided by scikit-learn.

Description

TruncatedSVD performs linear dimensionality reduction by means of truncated singular value decomposition (SVD). Unlike PCA, it does not center the data before computing the SVD, which allows it to work efficiently with sparse matrices. It supports two algorithms: a fast randomized SVD solver and a "naive" ARPACK-based eigensolver. In the context of text analysis with tf-idf matrices, truncated SVD is known as Latent Semantic Analysis (LSA).

Usage

Use TruncatedSVD when working with sparse data, particularly term-frequency or tf-idf matrices from text processing pipelines. It is the standard technique for Latent Semantic Analysis (LSA) in information retrieval, document similarity, and text classification. Also useful as a general-purpose dimensionality reduction tool when the input is sparse and centering would destroy sparsity.

Code Reference

Source Location

Signature

class TruncatedSVD(ClassNamePrefixFeaturesOutMixin, TransformerMixin, BaseEstimator):
    def __init__(
        self,
        n_components=2,
        *,
        algorithm="randomized",
        n_iter=5,
        n_oversamples=10,
        power_iteration_normalizer="auto",
        random_state=None,
        tol=0.0,
    ):

Import

from sklearn.decomposition import TruncatedSVD

I/O Contract

Inputs

Name Type Required Description
n_components int No Desired dimensionality of output data (default=2). For LSA, 100 is recommended.
algorithm str No SVD solver: 'arpack' or 'randomized' (default='randomized').
n_iter int No Number of iterations for randomized SVD solver (default=5).
n_oversamples int No Number of oversamples for randomized SVD solver (default=10).
power_iteration_normalizer str No Normalizer: 'auto', 'QR', 'LU', or 'none' (default='auto').
random_state int or RandomState No Random state for reproducibility.
tol float No Tolerance for ARPACK (default=0.0).

Outputs

Name Type Description
components_ ndarray of shape (n_components, n_features) The right singular vectors of the input data (the V^T in X = U S V^T).
explained_variance_ ndarray of shape (n_components,) Variance of the training data projected onto each component.
explained_variance_ratio_ ndarray of shape (n_components,) Ratio of variance explained by each component.
singular_values_ ndarray of shape (n_components,) Singular values corresponding to each selected component.
n_features_in_ int Number of features seen during fit.

Usage Examples

Basic Usage

from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer

documents = [
    "the cat sat on the mat",
    "the dog sat on the log",
    "cats and dogs are friends",
]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(documents)

svd = TruncatedSVD(n_components=2, random_state=42)
X_reduced = svd.fit_transform(X)
print(X_reduced.shape)  # (3, 2)
print(svd.explained_variance_ratio_)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment