Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn Scikit learn PCA

From Leeroopedia


Knowledge Sources
Domains Dimensionality Reduction, Feature Extraction
Last Updated 2026-02-08 15:00 GMT

Overview

Concrete tool for Principal Component Analysis (linear dimensionality reduction via SVD) provided by scikit-learn.

Description

PCA performs linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. The input data is centered but not scaled before applying the SVD. It uses the LAPACK implementation of full SVD, a covariance-based eigenvalue decomposition, a randomized truncated SVD by the method of Halko et al. 2009, or the ARPACK implementation for sparse inputs, depending on the solver configuration and input data shape. PCA also supports automatic selection of the number of components using Minka's MLE or a variance threshold.

Usage

Use PCA when you need to reduce the dimensionality of your data while preserving as much variance as possible. It is the most commonly used dimensionality reduction technique for data visualization, noise reduction, feature extraction, and as a preprocessing step for machine learning pipelines.

Code Reference

Source Location

Signature

class PCA(_BasePCA):
    def __init__(
        self,
        n_components=None,
        *,
        copy=True,
        whiten=False,
        svd_solver="auto",
        tol=0.0,
        iterated_power="auto",
        n_oversamples=10,
        power_iteration_normalizer="auto",
        random_state=None,
    ):

Import

from sklearn.decomposition import PCA

I/O Contract

Inputs

Name Type Required Description
n_components int, float, or 'mle' No Number of components to keep. Float in (0,1) selects by variance ratio. 'mle' uses Minka's MLE.
copy bool No If False, data passed to fit is overwritten (default=True).
whiten bool No When True, components are scaled to ensure uncorrelated outputs with unit variance (default=False).
svd_solver str No SVD solver: 'auto', 'full', 'covariance_eigh', 'arpack', or 'randomized' (default='auto').
tol float No Tolerance for singular values with arpack solver (default=0.0).
iterated_power int or 'auto' No Number of iterations for randomized SVD solver (default='auto').
n_oversamples int No Number of oversamples for randomized SVD solver (default=10).
power_iteration_normalizer str No Power iteration normalizer: 'auto', 'QR', 'LU', or 'none' (default='auto').
random_state int or RandomState No Random state for reproducibility.

Outputs

Name Type Description
components_ ndarray of shape (n_components, n_features) Principal axes in feature space (directions of maximum variance).
explained_variance_ ndarray of shape (n_components,) Amount of variance explained by each selected component.
explained_variance_ratio_ ndarray of shape (n_components,) Proportion of variance explained by each selected component.
singular_values_ ndarray of shape (n_components,) Singular values corresponding to each selected component.
mean_ ndarray of shape (n_features,) Per-feature empirical mean.
n_components_ int Estimated number of components.
noise_variance_ float Estimated noise covariance (only with 'mle' or fractional n_components).

Usage Examples

Basic Usage

import numpy as np
from sklearn.decomposition import PCA

X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]], dtype=float)
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
print(X_reduced.shape)  # (4, 2)
print(pca.explained_variance_ratio_)
print(pca.singular_values_)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment