Implementation:Scikit learn Scikit learn PCA
| Knowledge Sources | |
|---|---|
| Domains | Dimensionality Reduction, Feature Extraction |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for Principal Component Analysis (linear dimensionality reduction via SVD) provided by scikit-learn.
Description
PCA performs linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. The input data is centered but not scaled before applying the SVD. It uses the LAPACK implementation of full SVD, a covariance-based eigenvalue decomposition, a randomized truncated SVD by the method of Halko et al. 2009, or the ARPACK implementation for sparse inputs, depending on the solver configuration and input data shape. PCA also supports automatic selection of the number of components using Minka's MLE or a variance threshold.
Usage
Use PCA when you need to reduce the dimensionality of your data while preserving as much variance as possible. It is the most commonly used dimensionality reduction technique for data visualization, noise reduction, feature extraction, and as a preprocessing step for machine learning pipelines.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/decomposition/_pca.py
Signature
class PCA(_BasePCA):
def __init__(
self,
n_components=None,
*,
copy=True,
whiten=False,
svd_solver="auto",
tol=0.0,
iterated_power="auto",
n_oversamples=10,
power_iteration_normalizer="auto",
random_state=None,
):
Import
from sklearn.decomposition import PCA
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| n_components | int, float, or 'mle' | No | Number of components to keep. Float in (0,1) selects by variance ratio. 'mle' uses Minka's MLE. |
| copy | bool | No | If False, data passed to fit is overwritten (default=True). |
| whiten | bool | No | When True, components are scaled to ensure uncorrelated outputs with unit variance (default=False). |
| svd_solver | str | No | SVD solver: 'auto', 'full', 'covariance_eigh', 'arpack', or 'randomized' (default='auto'). |
| tol | float | No | Tolerance for singular values with arpack solver (default=0.0). |
| iterated_power | int or 'auto' | No | Number of iterations for randomized SVD solver (default='auto'). |
| n_oversamples | int | No | Number of oversamples for randomized SVD solver (default=10). |
| power_iteration_normalizer | str | No | Power iteration normalizer: 'auto', 'QR', 'LU', or 'none' (default='auto'). |
| random_state | int or RandomState | No | Random state for reproducibility. |
Outputs
| Name | Type | Description |
|---|---|---|
| components_ | ndarray of shape (n_components, n_features) | Principal axes in feature space (directions of maximum variance). |
| explained_variance_ | ndarray of shape (n_components,) | Amount of variance explained by each selected component. |
| explained_variance_ratio_ | ndarray of shape (n_components,) | Proportion of variance explained by each selected component. |
| singular_values_ | ndarray of shape (n_components,) | Singular values corresponding to each selected component. |
| mean_ | ndarray of shape (n_features,) | Per-feature empirical mean. |
| n_components_ | int | Estimated number of components. |
| noise_variance_ | float | Estimated noise covariance (only with 'mle' or fractional n_components). |
Usage Examples
Basic Usage
import numpy as np
from sklearn.decomposition import PCA
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]], dtype=float)
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
print(X_reduced.shape) # (4, 2)
print(pca.explained_variance_ratio_)
print(pca.singular_values_)