Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Rapidsai Cuml Sign Flip MG

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Dimensionality_Reduction
Last Updated 2026-02-08 12:00 GMT

Overview

Provides multi-node multi-GPU (MNMG) sign-flip operations for stabilizing the sign of eigenvectors produced by PCA and tSVD decompositions.

Description

Eigenvectors are only determined up to a sign (i.e., if v is an eigenvector, so is -v). This ambiguity can cause inconsistent results across runs or across nodes in a distributed setting. The functions in this header resolve this by flipping the sign of eigenvector columns so that the largest absolute value in each column is positive, producing deterministic, reproducible results.

Two function families are provided:

  • sign_flip_components_u: Flips signs using the input data matrix to determine the dominant sign. Accepts additional parameters for number of samples, features, and an option to center the input data. This variant is used when the full U matrix (left singular vectors) context is needed.
  • sign_flip: A simpler variant that flips signs based on the input data and component matrix directly, requiring only the number of components.

Both families have float and double overloads and operate on distributed data described by MLCommon::Matrix::PartDescriptor, accepting multiple CUDA streams for concurrent execution across GPU partitions.

Usage

Use these functions after performing a distributed PCA or tSVD decomposition in a multi-GPU environment to ensure that the resulting eigenvectors have consistent signs across all nodes. This is essential for reproducibility in multi-node training pipelines.

Code Reference

Source Location

  • Repository: Rapidsai_Cuml
  • File: cpp/include/cuml/decomposition/sign_flip_mg.hpp

Signature

namespace ML {
namespace PCA {
namespace opg {

void sign_flip_components_u(raft::handle_t& handle,
                            std::vector<MLCommon::Matrix::Data<float>*>& input_data,
                            MLCommon::Matrix::PartDescriptor& input_desc,
                            float* components,
                            std::size_t n_samples,
                            std::size_t n_features,
                            std::size_t n_components,
                            cudaStream_t* streams,
                            std::uint32_t n_stream,
                            bool center);

void sign_flip_components_u(raft::handle_t& handle,
                            std::vector<MLCommon::Matrix::Data<double>*>& input_data,
                            MLCommon::Matrix::PartDescriptor& input_desc,
                            double* components,
                            std::size_t n_samples,
                            std::size_t n_features,
                            std::size_t n_components,
                            cudaStream_t* streams,
                            std::uint32_t n_stream,
                            bool center);

void sign_flip(raft::handle_t& handle,
               std::vector<MLCommon::Matrix::Data<float>*>& input_data,
               MLCommon::Matrix::PartDescriptor& input_desc,
               float* components,
               std::size_t n_components,
               cudaStream_t* streams,
               std::uint32_t n_stream);

void sign_flip(raft::handle_t& handle,
               std::vector<MLCommon::Matrix::Data<double>*>& input_data,
               MLCommon::Matrix::PartDescriptor& input_desc,
               double* components,
               std::size_t n_components,
               cudaStream_t* streams,
               std::uint32_t n_stream);

};  // end namespace opg
};  // end namespace PCA
};  // end namespace ML

Import

#include <cuml/decomposition/sign_flip_mg.hpp>

I/O Contract

Inputs (sign_flip_components_u)

Name Type Required Description
handle raft::handle_t& Yes cuML handle for GPU resource management
input_data std::vector<MLCommon::Matrix::Data<T>*>& Yes Distributed input matrix partitions on device
input_desc MLCommon::Matrix::PartDescriptor& Yes MNMG descriptor for the input partitions
n_samples std::size_t Yes Total number of rows in the input matrix
n_features std::size_t Yes Number of columns in the input/components matrix
n_components std::size_t Yes Number of rows in the components matrix
streams cudaStream_t* Yes Array of CUDA streams for concurrent execution
n_stream std::uint32_t Yes Number of CUDA streams
center bool Yes Whether to center the input data by columns

Inputs (sign_flip)

Name Type Required Description
handle raft::handle_t& Yes cuML handle for GPU resource management
input_data std::vector<MLCommon::Matrix::Data<T>*>& Yes Distributed input matrix partitions on device
input_desc MLCommon::Matrix::PartDescriptor& Yes MNMG descriptor for the input partitions
n_components std::size_t Yes Number of columns in the components matrix
streams cudaStream_t* Yes Array of CUDA streams for concurrent execution
n_stream std::uint32_t Yes Number of CUDA streams

Outputs

Name Type Description
components float*/double* Device pointer to the components matrix, modified in-place with corrected signs

Usage Examples

#include <cuml/decomposition/sign_flip_mg.hpp>
#include <cuml/prims/opg/matrix/data.hpp>
#include <cuml/prims/opg/matrix/part_descriptor.hpp>
#include <raft/core/handle.hpp>

void stabilize_pca_signs(raft::handle_t& handle,
                         std::vector<MLCommon::Matrix::Data<float>*>& input_data,
                         MLCommon::Matrix::PartDescriptor& input_desc,
                         float* components,
                         std::size_t n_components) {
    // Create CUDA streams for parallel execution
    int n_streams = 4;
    std::vector<cudaStream_t> streams(n_streams);
    for (int i = 0; i < n_streams; i++) {
        cudaStreamCreate(&streams[i]);
    }

    // Stabilize eigenvector signs across multiple GPUs
    ML::PCA::opg::sign_flip(handle,
                             input_data,
                             input_desc,
                             components,
                             n_components,
                             streams.data(),
                             n_streams);

    // Synchronize and clean up streams
    for (int i = 0; i < n_streams; i++) {
        cudaStreamSynchronize(streams[i]);
        cudaStreamDestroy(streams[i]);
    }
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment