Implementation:Scikit learn Scikit learn PCA

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Dimensionality Reduction, Feature Extraction
Last Updated	2026-02-08 15:00 GMT

Overview

Concrete tool for Principal Component Analysis (linear dimensionality reduction via SVD) provided by scikit-learn.

Description

PCA performs linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. The input data is centered but not scaled before applying the SVD. It uses the LAPACK implementation of full SVD, a covariance-based eigenvalue decomposition, a randomized truncated SVD by the method of Halko et al. 2009, or the ARPACK implementation for sparse inputs, depending on the solver configuration and input data shape. PCA also supports automatic selection of the number of components using Minka's MLE or a variance threshold.

Usage

Use PCA when you need to reduce the dimensionality of your data while preserving as much variance as possible. It is the most commonly used dimensionality reduction technique for data visualization, noise reduction, feature extraction, and as a preprocessing step for machine learning pipelines.

Code Reference

Source Location

Repository: scikit-learn
File: sklearn/decomposition/_pca.py

Signature

class PCA(_BasePCA):
    def __init__(
        self,
        n_components=None,
        *,
        copy=True,
        whiten=False,
        svd_solver="auto",
        tol=0.0,
        iterated_power="auto",
        n_oversamples=10,
        power_iteration_normalizer="auto",
        random_state=None,
    ):

Import

from sklearn.decomposition import PCA

I/O Contract

Inputs

Name	Type	Required	Description
n_components	int, float, or 'mle'	No	Number of components to keep. Float in (0,1) selects by variance ratio. 'mle' uses Minka's MLE.
copy	bool	No	If False, data passed to fit is overwritten (default=True).
whiten	bool	No	When True, components are scaled to ensure uncorrelated outputs with unit variance (default=False).
svd_solver	str	No	SVD solver: 'auto', 'full', 'covariance_eigh', 'arpack', or 'randomized' (default='auto').
tol	float	No	Tolerance for singular values with arpack solver (default=0.0).
iterated_power	int or 'auto'	No	Number of iterations for randomized SVD solver (default='auto').
n_oversamples	int	No	Number of oversamples for randomized SVD solver (default=10).
power_iteration_normalizer	str	No	Power iteration normalizer: 'auto', 'QR', 'LU', or 'none' (default='auto').
random_state	int or RandomState	No	Random state for reproducibility.

Outputs

Name	Type	Description
components_	ndarray of shape (n_components, n_features)	Principal axes in feature space (directions of maximum variance).
explained_variance_	ndarray of shape (n_components,)	Amount of variance explained by each selected component.
explained_variance_ratio_	ndarray of shape (n_components,)	Proportion of variance explained by each selected component.
singular_values_	ndarray of shape (n_components,)	Singular values corresponding to each selected component.
mean_	ndarray of shape (n_features,)	Per-feature empirical mean.
n_components_	int	Estimated number of components.
noise_variance_	float	Estimated noise covariance (only with 'mle' or fractional n_components).

Usage Examples

Basic Usage

import numpy as np
from sklearn.decomposition import PCA

X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]], dtype=float)
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
print(X_reduced.shape)  # (4, 2)
print(pca.explained_variance_ratio_)
print(pca.singular_values_)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment