Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Scikit learn Scikit learn Cross Decomposition

From Leeroopedia


Knowledge Sources
Domains Supervised Learning, Dimensionality Reduction
Last Updated 2026-02-08 15:00 GMT

Overview

Cross decomposition methods find the fundamental relations between two multivariate datasets by projecting them into shared latent spaces that maximize their covariance or correlation.

Description

Cross decomposition techniques simultaneously decompose two data matrices X and Y to find latent components that capture the maximum covariance between them. Unlike standard regression, which predicts Y from X, cross decomposition finds joint latent structures shared by both datasets. These methods solve the problem of modeling relationships between two high-dimensional sets of variables, particularly when the variables within each set are highly correlated (multicollinear). Cross decomposition sits at the intersection of dimensionality reduction and regression, commonly applied in chemometrics, neuroimaging, and genomics.

Usage

Use Partial Least Squares (PLS) Regression when predicting a multivariate response Y from a high-dimensional predictor X and ordinary least squares fails due to multicollinearity or high dimensionality. PLS is the standard method in chemometrics for relating spectral measurements to chemical properties. Use Canonical Correlation Analysis (CCA) when the goal is to find maximally correlated linear combinations of two sets of variables, without a predictor-response distinction. CCA is common in neuroscience for relating brain activity patterns to behavioral measures.

Theoretical Basis

Partial Least Squares (PLS) Regression: PLS finds weight vectors wk and ck that maximize the covariance between the latent components:

(wk,ck)=argmaxw,ccov(Xw,Yc)=argmaxw,cwTXTYc

subject to w=1, c=1.

The PLS algorithm (NIPALS):

  1. Compute weight vectors w=XTYc/XTYc and c=YTXw/YTXw iteratively.
  2. Compute scores: t=Xw (X-scores), u=Yc (Y-scores).
  3. Compute loadings: p=XTt/(tTt), q=YTt/(tTt).
  4. Deflate: XXtpT, YYtqT.
  5. Repeat for each component.

The regression coefficients are then: B^=W(PTW)1QT.

Canonical Correlation Analysis (CCA): CCA finds pairs of linear combinations with maximum correlation:

(wk,ck)=argmaxw,ccorr(Xw,Yc)=argmaxw,cwTΣXYcwTΣXXwcTΣYYc

The solution involves the generalized eigenvalue problem:

ΣXX1ΣXYΣYY1ΣYXw=ρ2w

where ρ is the canonical correlation. CCA differs from PLS in that it maximizes correlation rather than covariance, making it invariant to scaling of the variables.

Comparison:

Method Objective Best for
PLS Max covariance Prediction, high-dimensional X
CCA Max correlation Finding shared structure, equal dimensionality

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment