Implementation:Rapidsai Cuml Coordinate Descent MG

Knowledge Sources	Rapidsai_Cuml
Domains	Machine_Learning, Linear_Models
Last Updated	2026-02-08 12:00 GMT

Overview

Provides multi-node multi-GPU (MNMG) coordinate descent solver for fitting and predicting with ridge regression and elastic-net models in cuML.

Description

The cd_mg.hpp header declares the multi-GPU coordinate descent API in the ML::CD::opg namespace. It provides two main operations:

fit: Trains a ridge/elastic-net regression model using coordinate descent optimization across distributed data partitions. The solver minimizes the elastic-net objective which combines L1 and L2 regularization controlled by alpha and l1_ratio parameters. It supports both float and double precision and returns the number of iterations run.
predict: Performs distributed prediction using a trained model's coefficients and intercept, computing predictions across multiple data partitions.

Both functions accept data distributed across GPU ranks using MLCommon::Matrix::Data pointers and MLCommon::Matrix::PartDescriptor / MLCommon::Matrix::RankSizePair for partition metadata. This allows processing datasets too large for a single GPU.

Usage

Use this API when fitting ridge regression, lasso, or elastic-net models on datasets distributed across multiple GPUs in a cluster environment. This is the multi-GPU variant of the single-GPU coordinate descent solver. Typical use cases include large-scale linear regression with regularization where the data does not fit on a single GPU.

Code Reference

Source Location

Repository: Rapidsai_Cuml
File: cpp/include/cuml/solvers/cd_mg.hpp

Signature

namespace ML {
namespace CD {
namespace opg {

int fit(raft::handle_t& handle,
        std::vector<MLCommon::Matrix::Data<float>*>& input_data,
        MLCommon::Matrix::PartDescriptor& input_desc,
        std::vector<MLCommon::Matrix::Data<float>*>& labels,
        float* coef,
        float* intercept,
        bool fit_intercept,
        int epochs,
        float alpha,
        float l1_ratio,
        bool shuffle,
        float tol,
        bool verbose);

int fit(raft::handle_t& handle,
        std::vector<MLCommon::Matrix::Data<double>*>& input_data,
        MLCommon::Matrix::PartDescriptor& input_desc,
        std::vector<MLCommon::Matrix::Data<double>*>& labels,
        double* coef,
        double* intercept,
        bool fit_intercept,
        int epochs,
        double alpha,
        double l1_ratio,
        bool shuffle,
        double tol,
        bool verbose);

void predict(raft::handle_t& handle,
             MLCommon::Matrix::RankSizePair** rank_sizes,
             size_t n_parts,
             MLCommon::Matrix::Data<float>** input,
             size_t n_rows,
             size_t n_cols,
             float* coef,
             float intercept,
             MLCommon::Matrix::Data<float>** preds,
             bool verbose);

void predict(raft::handle_t& handle,
             MLCommon::Matrix::RankSizePair** rank_sizes,
             size_t n_parts,
             MLCommon::Matrix::Data<double>** input,
             size_t n_rows,
             size_t n_cols,
             double* coef,
             double intercept,
             MLCommon::Matrix::Data<double>** preds,
             bool verbose);

} // namespace opg
} // namespace CD
} // namespace ML

Import

#include <cuml/solvers/cd_mg.hpp>

I/O Contract

Inputs

fit

Name	Type	Required	Description
handle	raft::handle_t&	Yes	cuML handle with multi-GPU communicator
input_data	std::vector<MLCommon::Matrix::Data<T>*>&	Yes	Vector of data partitions for this rank
input_desc	MLCommon::Matrix::PartDescriptor&	Yes	Descriptor for the input data partitioning
labels	std::vector<MLCommon::Matrix::Data<T>*>&	Yes	Vector of label partitions for this rank
fit_intercept	bool	Yes	Whether to fit an intercept term
epochs	int	Yes	Maximum number of coordinate descent iterations
alpha	T	Yes	Regularization strength parameter
l1_ratio	T	Yes	Ratio of L1 to total regularization (0.0 = ridge, 1.0 = lasso)
shuffle	bool	Yes	Whether to shuffle coordinate order each epoch
tol	T	Yes	Convergence tolerance for early stopping
verbose	bool	Yes	Whether to enable verbose logging

predict

Name	Type	Required	Description
handle	raft::handle_t&	Yes	cuML handle
rank_sizes	MLCommon::Matrix::RankSizePair**	Yes	Partition size info for all ranks
n_parts	size_t	Yes	Number of partitions
input	MLCommon::Matrix::Data<T>**	Yes	Array of input data partitions
n_rows	size_t	Yes	Total number of rows
n_cols	size_t	Yes	Number of features
coef	T*	Yes	Device pointer to learned coefficients [n_cols]
intercept	T	Yes	Learned intercept value
verbose	bool	Yes	Whether to enable verbose logging

Outputs

Name	Type	Description
coef (fit)	T*	Device array of learned regression coefficients [n_cols]
intercept (fit)	T*	Scalar intercept value
return value (fit)	int	Number of iterations the solver ran
preds (predict)	MLCommon::Matrix::Data<T>**	Array of prediction partitions across ranks

Usage Examples

#include <cuml/solvers/cd_mg.hpp>

raft::handle_t handle;

// Distributed input data and labels
std::vector<MLCommon::Matrix::Data<float>*> input_data;
std::vector<MLCommon::Matrix::Data<float>*> labels;
MLCommon::Matrix::PartDescriptor desc;
// ... populate partitions ...

int n_cols = 100;
float* d_coef;       // device array [n_cols]
float intercept;

// Fit elastic-net model with alpha=1.0, l1_ratio=0.5
int n_iter = ML::CD::opg::fit(handle, input_data, desc, labels,
                               d_coef, &intercept,
                               true,    // fit_intercept
                               1000,    // epochs
                               1.0f,    // alpha
                               0.5f,    // l1_ratio
                               true,    // shuffle
                               1e-4f,   // tol
                               false);  // verbose

Related Pages

Environment:Rapidsai_Cuml_CUDA_GPU

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment