Implementation:Scikit learn Scikit learn TruncatedSVD

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Dimensionality Reduction, Natural Language Processing
Last Updated	2026-02-08 15:00 GMT

Overview

Concrete tool for dimensionality reduction using truncated SVD (also known as Latent Semantic Analysis) provided by scikit-learn.

Description

TruncatedSVD performs linear dimensionality reduction by means of truncated singular value decomposition (SVD). Unlike PCA, it does not center the data before computing the SVD, which allows it to work efficiently with sparse matrices. It supports two algorithms: a fast randomized SVD solver and a "naive" ARPACK-based eigensolver. In the context of text analysis with tf-idf matrices, truncated SVD is known as Latent Semantic Analysis (LSA).

Usage

Use TruncatedSVD when working with sparse data, particularly term-frequency or tf-idf matrices from text processing pipelines. It is the standard technique for Latent Semantic Analysis (LSA) in information retrieval, document similarity, and text classification. Also useful as a general-purpose dimensionality reduction tool when the input is sparse and centering would destroy sparsity.

Code Reference

Source Location

Repository: scikit-learn
File: sklearn/decomposition/_truncated_svd.py

Signature

class TruncatedSVD(ClassNamePrefixFeaturesOutMixin, TransformerMixin, BaseEstimator):
    def __init__(
        self,
        n_components=2,
        *,
        algorithm="randomized",
        n_iter=5,
        n_oversamples=10,
        power_iteration_normalizer="auto",
        random_state=None,
        tol=0.0,
    ):

Import

from sklearn.decomposition import TruncatedSVD

I/O Contract

Inputs

Name	Type	Required	Description
n_components	int	No	Desired dimensionality of output data (default=2). For LSA, 100 is recommended.
algorithm	str	No	SVD solver: 'arpack' or 'randomized' (default='randomized').
n_iter	int	No	Number of iterations for randomized SVD solver (default=5).
n_oversamples	int	No	Number of oversamples for randomized SVD solver (default=10).
power_iteration_normalizer	str	No	Normalizer: 'auto', 'QR', 'LU', or 'none' (default='auto').
random_state	int or RandomState	No	Random state for reproducibility.
tol	float	No	Tolerance for ARPACK (default=0.0).

Outputs

Name	Type	Description
components_	ndarray of shape (n_components, n_features)	The right singular vectors of the input data (the V^T in X = U S V^T).
explained_variance_	ndarray of shape (n_components,)	Variance of the training data projected onto each component.
explained_variance_ratio_	ndarray of shape (n_components,)	Ratio of variance explained by each component.
singular_values_	ndarray of shape (n_components,)	Singular values corresponding to each selected component.
n_features_in_	int	Number of features seen during fit.

Usage Examples

Basic Usage

from sklearn.decomposition import TruncatedSVD
from sklearn.feature_extraction.text import TfidfVectorizer

documents = [
    "the cat sat on the mat",
    "the dog sat on the log",
    "cats and dogs are friends",
]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(documents)

svd = TruncatedSVD(n_components=2, random_state=42)
X_reduced = svd.fit_transform(X)
print(X_reduced.shape)  # (3, 2)
print(svd.explained_variance_ratio_)

Related Pages

Principle:Scikit_learn_Scikit_learn_Dimensionality_Reduction

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment