Principle:Scikit learn Scikit learn Cross Decomposition
| Knowledge Sources | |
|---|---|
| Domains | Supervised Learning, Dimensionality Reduction |
| Last Updated | 2026-02-08 15:00 GMT |
Overview
Cross decomposition methods find the fundamental relations between two multivariate datasets by projecting them into shared latent spaces that maximize their covariance or correlation.
Description
Cross decomposition techniques simultaneously decompose two data matrices and to find latent components that capture the maximum covariance between them. Unlike standard regression, which predicts from , cross decomposition finds joint latent structures shared by both datasets. These methods solve the problem of modeling relationships between two high-dimensional sets of variables, particularly when the variables within each set are highly correlated (multicollinear). Cross decomposition sits at the intersection of dimensionality reduction and regression, commonly applied in chemometrics, neuroimaging, and genomics.
Usage
Use Partial Least Squares (PLS) Regression when predicting a multivariate response from a high-dimensional predictor and ordinary least squares fails due to multicollinearity or high dimensionality. PLS is the standard method in chemometrics for relating spectral measurements to chemical properties. Use Canonical Correlation Analysis (CCA) when the goal is to find maximally correlated linear combinations of two sets of variables, without a predictor-response distinction. CCA is common in neuroscience for relating brain activity patterns to behavioral measures.
Theoretical Basis
Partial Least Squares (PLS) Regression: PLS finds weight vectors and that maximize the covariance between the latent components:
subject to , .
The PLS algorithm (NIPALS):
- Compute weight vectors and iteratively.
- Compute scores: (X-scores), (Y-scores).
- Compute loadings: , .
- Deflate: , .
- Repeat for each component.
The regression coefficients are then: .
Canonical Correlation Analysis (CCA): CCA finds pairs of linear combinations with maximum correlation:
The solution involves the generalized eigenvalue problem:
where is the canonical correlation. CCA differs from PLS in that it maximizes correlation rather than covariance, making it invariant to scaling of the variables.
Comparison:
| Method | Objective | Best for |
|---|---|---|
| PLS | Max covariance | Prediction, high-dimensional |
| CCA | Max correlation | Finding shared structure, equal dimensionality |