Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Rapidsai Cuml IncrementalPCA

From Leeroopedia
Revision as of 16:27, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Rapidsai_Cuml_IncrementalPCA.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Machine_Learning, Dimensionality_Reduction
Last Updated 2026-02-08 12:00 GMT

Overview

IncrementalPCA provides a GPU-accelerated implementation of Incremental Principal Component Analysis that performs linear dimensionality reduction using SVD in a memory-efficient, batch-wise manner.

Description

The IncrementalPCA class extends the cuML PCA class to perform incremental principal components analysis (IPCA). Unlike standard PCA which requires loading the full dataset into memory, IncrementalPCA processes data in mini-batches, making it suitable for large datasets that cannot fit in GPU memory. It supports both dense and sparse input matrices (CSR format).

The algorithm centers the input data (but does not scale it) before applying SVD. It maintains running statistics across batches via the partial_fit method, allowing streaming or out-of-core learning. The computational overhead per SVD call is O(batch_size * n_features^2), and only 2 * batch_size samples are held in memory at a time. The implementation is based on sklearn.decomposition.IncrementalPCA from scikit-learn 0.23.1 and uses the incremental PCA model from Ross et al. (2008).

Usage

Use IncrementalPCA when you need to perform PCA on datasets that are too large to fit entirely in GPU memory, when working with streaming data that arrives in batches, or when dealing with sparse input matrices. It is also useful for reducing memory consumption compared to a full PCA while obtaining an approximate result.

Code Reference

Source Location

  • Repository: Rapidsai_Cuml
  • File: python/cuml/cuml/decomposition/incremental_pca.py

Signature

class IncrementalPCA(PCA):
    def __init__(
        self,
        *,
        n_components=None,
        whiten=False,
        copy=True,
        batch_size=None,
        verbose=False,
        output_type=None,
    )

Import

from cuml.decomposition import IncrementalPCA

I/O Contract

Inputs

Name Type Required Description
n_components int or None No Number of components to keep. If None, set to min(n_samples, n_features).
whiten bool No If True, de-correlates components by dividing by singular values and multiplying by sqrt(n_samples). Default is False.
copy bool No If False, X will be overwritten to save memory. Default is True.
batch_size int or None No Number of samples per batch for fit(). If None, defaults to 5 * n_features.
verbose int or bool No Sets logging level. Default is False.
output_type str or None No Return results in the indicated output type (e.g., 'cupy', 'numpy', 'cudf').

Outputs

Name Type Description
components_ array (n_components, n_features) Principal axes in feature space representing maximum variance directions.
explained_variance_ array (n_components,) Variance explained by each selected component.
explained_variance_ratio_ array (n_components,) Percentage of variance explained by each selected component.
singular_values_ array (n_components,) Singular values corresponding to each selected component.
mean_ array (n_features,) Per-feature empirical mean, aggregated over calls to partial_fit.
var_ array (n_features,) Per-feature empirical variance, aggregated over calls to partial_fit.
noise_variance_ float Estimated noise covariance following the Probabilistic PCA model.
n_components_ int The estimated number of components.
n_samples_seen_ int The number of samples processed by the estimator.
batch_size_ int Inferred batch size from batch_size parameter.

Usage Examples

Basic Usage

from cuml.decomposition import IncrementalPCA
import cupy as cp
import cupyx

# Create a sparse random matrix
X = cupyx.scipy.sparse.random(1000, 4, format='csr', density=0.07, random_state=5)

# Fit IncrementalPCA with 2 components and batch size of 200
ipca = IncrementalPCA(n_components=2, batch_size=200)
ipca.fit(X)

# Access results
print(ipca.components_)
print(ipca.singular_values_)
print(ipca.explained_variance_)
print(ipca.explained_variance_ratio_)

# Transform new data
X_transformed = ipca.transform(X)

Incremental Fitting with partial_fit

from cuml.decomposition import IncrementalPCA
import cupy as cp

ipca = IncrementalPCA(n_components=2)

# Simulate streaming data in batches
for i in range(5):
    X_batch = cp.random.rand(100, 10).astype(cp.float32)
    ipca.partial_fit(X_batch)

# Transform data after incremental fitting
X_new = cp.random.rand(50, 10).astype(cp.float32)
X_transformed = ipca.transform(X_new)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment