Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Rapidsai Cuml MultiGPU Eigendecomposition

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Linear_Algebra
Last Updated 2026-02-08 12:00 GMT

Overview

Provides multi-GPU eigendecomposition functions for symmetric matrices, supporting both divide-and-conquer and Jacobi methods with data distributed across multiple GPUs.

Description

The eig.hpp header declares multi-GPU (MNMG) eigendecomposition routines within the MLCommon::LinAlg::opg namespace. These functions are designed for large symmetric matrices that are partitioned across multiple GPU ranks in a multi-node, multi-GPU environment.

Two eigendecomposition methods are provided:

  • eigDC (Divide and Conquer): Gathers the full input matrix at rank 0 and performs eigendecomposition sequentially using the divide-and-conquer algorithm. Available for both float and double precision.
  • eigJacobi (Jacobi): Similar gathering strategy but uses the Jacobi iterative method for eigendecomposition. Also available for both float and double precision.

Both methods accept partitioned input data as a vector of Matrix::Data pointers with a Matrix::PartDescriptor describing the distribution layout. The functions operate within the RAFT communicator framework using MPI ranks and CUDA streams.

Usage

Use these functions when computing eigenvalues and eigenvectors of large symmetric matrices in a multi-GPU environment. The divide-and-conquer method (eigDC) is generally faster for larger matrices, while the Jacobi method (eigJacobi) may be more numerically stable for certain matrix types. These functions are commonly used as building blocks for PCA, spectral methods, and other algorithms requiring eigendecomposition at scale.

Code Reference

Source Location

  • Repository: Rapidsai_Cuml
  • File: cpp/include/cuml/prims/opg/linalg/eig.hpp

Signature

namespace MLCommon {
namespace LinAlg {
namespace opg {

void eigDC(const raft::handle_t& h,
           float* eigenValues,
           float* eigenVectors,
           std::vector<Matrix::Data<float>*>& inParts,
           Matrix::PartDescriptor& desc,
           int myRank,
           cudaStream_t stream);

void eigDC(const raft::handle_t& h,
           double* eigenValues,
           double* eigenVectors,
           std::vector<Matrix::Data<double>*>& inParts,
           Matrix::PartDescriptor& desc,
           int myRank,
           cudaStream_t stream);

void eigJacobi(const raft::handle_t& h,
               float* eigenValues,
               float* eigenVectors,
               std::vector<Matrix::Data<float>*>& inParts,
               Matrix::PartDescriptor& desc,
               int myRank,
               cudaStream_t stream);

void eigJacobi(const raft::handle_t& h,
               double* eigenValues,
               double* eigenVectors,
               std::vector<Matrix::Data<double>*>& inParts,
               Matrix::PartDescriptor& desc,
               int myRank,
               cudaStream_t stream);

} // namespace opg
} // namespace LinAlg
} // namespace MLCommon

Import

#include <cuml/prims/opg/linalg/eig.hpp>

I/O Contract

Inputs

Name Type Required Description
h const raft::handle_t& Yes cuML handle with RAFT communicator for multi-GPU coordination
inParts std::vector<Matrix::Data<T>*>& Yes Vector of device pointers to the local partitions of the input symmetric matrix [N x N]
desc Matrix::PartDescriptor& Yes Descriptor defining the partitioning layout of the input matrix across ranks
myRank int Yes MPI rank of the current process
stream cudaStream_t Yes CUDA stream for asynchronous execution

Outputs

Name Type Description
eigenValues T* (float or double) Device array of N computed eigenvalues
eigenVectors T* (float or double) Device array of N eigenvectors, each of size N x 1

Usage Examples

#include <cuml/prims/opg/linalg/eig.hpp>

// Setup multi-GPU handle with communicator
raft::handle_t handle;
cudaStream_t stream;
cudaStreamCreate(&stream);
int myRank = 0;  // MPI rank

// Partitioned input: symmetric matrix of size N x N
std::vector<MLCommon::Matrix::Data<float>*> inParts;
MLCommon::Matrix::PartDescriptor desc;
// ... populate inParts and desc with partition info ...

int N = 1000;
float* d_eigenValues;   // device array [N]
float* d_eigenVectors;  // device array [N * N]

// Eigendecomposition using divide-and-conquer
MLCommon::LinAlg::opg::eigDC(handle, d_eigenValues, d_eigenVectors,
                              inParts, desc, myRank, stream);

// Or using Jacobi method
MLCommon::LinAlg::opg::eigJacobi(handle, d_eigenValues, d_eigenVectors,
                                  inParts, desc, myRank, stream);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment