Implementation:Scikit learn Scikit learn SparsePCA
| Knowledge Sources | |
|---|---|
| Domains | Dimensionality Reduction, Sparse Coding |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Concrete tool for Sparse Principal Components Analysis provided by scikit-learn.
Description
SparsePCA finds the set of sparse components that can optimally reconstruct the data, combining the ability of PCA to extract the most important directions with L1 sparsity on the components. The amount of sparseness is controllable by the alpha parameter. Internally it uses dictionary learning to extract sparse components. The module also provides MiniBatchSparsePCA for a faster but less accurate variant using mini-batch dictionary learning.
Usage
Use SparsePCA when you need interpretable principal components with many zero loadings. It is ideal for applications where you want to understand which features contribute to each component, such as in genomics, neuroimaging, and financial data analysis where sparse, interpretable factors are preferred over dense PCA components.
Code Reference
Source Location
- Repository: scikit-learn
- File: sklearn/decomposition/_sparse_pca.py
Signature
class SparsePCA(_BaseSparsePCA):
def __init__(
self,
n_components=None,
*,
alpha=1,
ridge_alpha=0.01,
max_iter=1000,
tol=1e-8,
method="lars",
n_jobs=None,
U_init=None,
V_init=None,
verbose=False,
random_state=None,
):
Import
from sklearn.decomposition import SparsePCA
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| n_components | int | No | Number of sparse atoms to extract. Defaults to n_features if None. |
| alpha | float | No | Sparsity controlling parameter. Higher values lead to sparser components (default=1). |
| ridge_alpha | float | No | Amount of ridge shrinkage for the transform method (default=0.01). |
| max_iter | int | No | Maximum number of iterations (default=1000). |
| tol | float | No | Tolerance for the stopping condition (default=1e-8). |
| method | str | No | Optimization method: 'lars' or 'cd' (default='lars'). |
| n_jobs | int | No | Number of parallel jobs to run. |
| U_init | ndarray | No | Initial values for loadings (warm restart). |
| V_init | ndarray | No | Initial values for components (warm restart). |
| verbose | int or bool | No | Controls verbosity (default=False). |
| random_state | int or RandomState | No | Random state for reproducibility. |
Outputs
| Name | Type | Description |
|---|---|---|
| components_ | ndarray of shape (n_components, n_features) | Sparse components extracted from the data. |
| error_ | ndarray | Vector of errors at each iteration. |
| n_components_ | int | Estimated number of components. |
| n_iter_ | int | Number of iterations run. |
| mean_ | ndarray of shape (n_features,) | Per-feature empirical mean. |
| n_features_in_ | int | Number of features seen during fit. |
Usage Examples
Basic Usage
import numpy as np
from sklearn.decomposition import SparsePCA
X = np.random.rand(20, 10)
spca = SparsePCA(n_components=5, alpha=1, random_state=0)
X_transformed = spca.fit_transform(X)
print(X_transformed.shape) # (20, 5)
print(spca.components_.shape) # (5, 10)
# Components are sparse (many zeros)
print(np.sum(spca.components_ == 0))