Implementation:Rapidsai Cuml Sign Flip MG
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Dimensionality_Reduction |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
Provides multi-node multi-GPU (MNMG) sign-flip operations for stabilizing the sign of eigenvectors produced by PCA and tSVD decompositions.
Description
Eigenvectors are only determined up to a sign (i.e., if v is an eigenvector, so is -v). This ambiguity can cause inconsistent results across runs or across nodes in a distributed setting. The functions in this header resolve this by flipping the sign of eigenvector columns so that the largest absolute value in each column is positive, producing deterministic, reproducible results.
Two function families are provided:
sign_flip_components_u: Flips signs using the input data matrix to determine the dominant sign. Accepts additional parameters for number of samples, features, and an option to center the input data. This variant is used when the full U matrix (left singular vectors) context is needed.
sign_flip: A simpler variant that flips signs based on the input data and component matrix directly, requiring only the number of components.
Both families have float and double overloads and operate on distributed data described by MLCommon::Matrix::PartDescriptor, accepting multiple CUDA streams for concurrent execution across GPU partitions.
Usage
Use these functions after performing a distributed PCA or tSVD decomposition in a multi-GPU environment to ensure that the resulting eigenvectors have consistent signs across all nodes. This is essential for reproducibility in multi-node training pipelines.
Code Reference
Source Location
- Repository: Rapidsai_Cuml
- File:
cpp/include/cuml/decomposition/sign_flip_mg.hpp
Signature
namespace ML {
namespace PCA {
namespace opg {
void sign_flip_components_u(raft::handle_t& handle,
std::vector<MLCommon::Matrix::Data<float>*>& input_data,
MLCommon::Matrix::PartDescriptor& input_desc,
float* components,
std::size_t n_samples,
std::size_t n_features,
std::size_t n_components,
cudaStream_t* streams,
std::uint32_t n_stream,
bool center);
void sign_flip_components_u(raft::handle_t& handle,
std::vector<MLCommon::Matrix::Data<double>*>& input_data,
MLCommon::Matrix::PartDescriptor& input_desc,
double* components,
std::size_t n_samples,
std::size_t n_features,
std::size_t n_components,
cudaStream_t* streams,
std::uint32_t n_stream,
bool center);
void sign_flip(raft::handle_t& handle,
std::vector<MLCommon::Matrix::Data<float>*>& input_data,
MLCommon::Matrix::PartDescriptor& input_desc,
float* components,
std::size_t n_components,
cudaStream_t* streams,
std::uint32_t n_stream);
void sign_flip(raft::handle_t& handle,
std::vector<MLCommon::Matrix::Data<double>*>& input_data,
MLCommon::Matrix::PartDescriptor& input_desc,
double* components,
std::size_t n_components,
cudaStream_t* streams,
std::uint32_t n_stream);
}; // end namespace opg
}; // end namespace PCA
}; // end namespace ML
Import
#include <cuml/decomposition/sign_flip_mg.hpp>
I/O Contract
Inputs (sign_flip_components_u)
| Name | Type | Required | Description |
|---|---|---|---|
| handle | raft::handle_t& | Yes | cuML handle for GPU resource management |
| input_data | std::vector<MLCommon::Matrix::Data<T>*>& | Yes | Distributed input matrix partitions on device |
| input_desc | MLCommon::Matrix::PartDescriptor& | Yes | MNMG descriptor for the input partitions |
| n_samples | std::size_t | Yes | Total number of rows in the input matrix |
| n_features | std::size_t | Yes | Number of columns in the input/components matrix |
| n_components | std::size_t | Yes | Number of rows in the components matrix |
| streams | cudaStream_t* | Yes | Array of CUDA streams for concurrent execution |
| n_stream | std::uint32_t | Yes | Number of CUDA streams |
| center | bool | Yes | Whether to center the input data by columns |
Inputs (sign_flip)
| Name | Type | Required | Description |
|---|---|---|---|
| handle | raft::handle_t& | Yes | cuML handle for GPU resource management |
| input_data | std::vector<MLCommon::Matrix::Data<T>*>& | Yes | Distributed input matrix partitions on device |
| input_desc | MLCommon::Matrix::PartDescriptor& | Yes | MNMG descriptor for the input partitions |
| n_components | std::size_t | Yes | Number of columns in the components matrix |
| streams | cudaStream_t* | Yes | Array of CUDA streams for concurrent execution |
| n_stream | std::uint32_t | Yes | Number of CUDA streams |
Outputs
| Name | Type | Description |
|---|---|---|
| components | float*/double* | Device pointer to the components matrix, modified in-place with corrected signs |
Usage Examples
#include <cuml/decomposition/sign_flip_mg.hpp>
#include <cuml/prims/opg/matrix/data.hpp>
#include <cuml/prims/opg/matrix/part_descriptor.hpp>
#include <raft/core/handle.hpp>
void stabilize_pca_signs(raft::handle_t& handle,
std::vector<MLCommon::Matrix::Data<float>*>& input_data,
MLCommon::Matrix::PartDescriptor& input_desc,
float* components,
std::size_t n_components) {
// Create CUDA streams for parallel execution
int n_streams = 4;
std::vector<cudaStream_t> streams(n_streams);
for (int i = 0; i < n_streams; i++) {
cudaStreamCreate(&streams[i]);
}
// Stabilize eigenvector signs across multiple GPUs
ML::PCA::opg::sign_flip(handle,
input_data,
input_desc,
components,
n_components,
streams.data(),
n_streams);
// Synchronize and clean up streams
for (int i = 0; i < n_streams; i++) {
cudaStreamSynchronize(streams[i]);
cudaStreamDestroy(streams[i]);
}
}