Implementation:Rapidsai Cuml Coordinate Descent MG
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Linear_Models |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
Provides multi-node multi-GPU (MNMG) coordinate descent solver for fitting and predicting with ridge regression and elastic-net models in cuML.
Description
The cd_mg.hpp header declares the multi-GPU coordinate descent API in the ML::CD::opg namespace. It provides two main operations:
fit: Trains a ridge/elastic-net regression model using coordinate descent optimization across distributed data partitions. The solver minimizes the elastic-net objective which combines L1 and L2 regularization controlled byalphaandl1_ratioparameters. It supports bothfloatanddoubleprecision and returns the number of iterations run.predict: Performs distributed prediction using a trained model's coefficients and intercept, computing predictions across multiple data partitions.
Both functions accept data distributed across GPU ranks using MLCommon::Matrix::Data pointers and MLCommon::Matrix::PartDescriptor / MLCommon::Matrix::RankSizePair for partition metadata. This allows processing datasets too large for a single GPU.
Usage
Use this API when fitting ridge regression, lasso, or elastic-net models on datasets distributed across multiple GPUs in a cluster environment. This is the multi-GPU variant of the single-GPU coordinate descent solver. Typical use cases include large-scale linear regression with regularization where the data does not fit on a single GPU.
Code Reference
Source Location
- Repository: Rapidsai_Cuml
- File:
cpp/include/cuml/solvers/cd_mg.hpp
Signature
namespace ML {
namespace CD {
namespace opg {
int fit(raft::handle_t& handle,
std::vector<MLCommon::Matrix::Data<float>*>& input_data,
MLCommon::Matrix::PartDescriptor& input_desc,
std::vector<MLCommon::Matrix::Data<float>*>& labels,
float* coef,
float* intercept,
bool fit_intercept,
int epochs,
float alpha,
float l1_ratio,
bool shuffle,
float tol,
bool verbose);
int fit(raft::handle_t& handle,
std::vector<MLCommon::Matrix::Data<double>*>& input_data,
MLCommon::Matrix::PartDescriptor& input_desc,
std::vector<MLCommon::Matrix::Data<double>*>& labels,
double* coef,
double* intercept,
bool fit_intercept,
int epochs,
double alpha,
double l1_ratio,
bool shuffle,
double tol,
bool verbose);
void predict(raft::handle_t& handle,
MLCommon::Matrix::RankSizePair** rank_sizes,
size_t n_parts,
MLCommon::Matrix::Data<float>** input,
size_t n_rows,
size_t n_cols,
float* coef,
float intercept,
MLCommon::Matrix::Data<float>** preds,
bool verbose);
void predict(raft::handle_t& handle,
MLCommon::Matrix::RankSizePair** rank_sizes,
size_t n_parts,
MLCommon::Matrix::Data<double>** input,
size_t n_rows,
size_t n_cols,
double* coef,
double intercept,
MLCommon::Matrix::Data<double>** preds,
bool verbose);
} // namespace opg
} // namespace CD
} // namespace ML
Import
#include <cuml/solvers/cd_mg.hpp>
I/O Contract
Inputs
fit
| Name | Type | Required | Description |
|---|---|---|---|
| handle | raft::handle_t& | Yes | cuML handle with multi-GPU communicator |
| input_data | std::vector<MLCommon::Matrix::Data<T>*>& | Yes | Vector of data partitions for this rank |
| input_desc | MLCommon::Matrix::PartDescriptor& | Yes | Descriptor for the input data partitioning |
| labels | std::vector<MLCommon::Matrix::Data<T>*>& | Yes | Vector of label partitions for this rank |
| fit_intercept | bool | Yes | Whether to fit an intercept term |
| epochs | int | Yes | Maximum number of coordinate descent iterations |
| alpha | T | Yes | Regularization strength parameter |
| l1_ratio | T | Yes | Ratio of L1 to total regularization (0.0 = ridge, 1.0 = lasso) |
| shuffle | bool | Yes | Whether to shuffle coordinate order each epoch |
| tol | T | Yes | Convergence tolerance for early stopping |
| verbose | bool | Yes | Whether to enable verbose logging |
predict
| Name | Type | Required | Description |
|---|---|---|---|
| handle | raft::handle_t& | Yes | cuML handle |
| rank_sizes | MLCommon::Matrix::RankSizePair** | Yes | Partition size info for all ranks |
| n_parts | size_t | Yes | Number of partitions |
| input | MLCommon::Matrix::Data<T>** | Yes | Array of input data partitions |
| n_rows | size_t | Yes | Total number of rows |
| n_cols | size_t | Yes | Number of features |
| coef | T* | Yes | Device pointer to learned coefficients [n_cols] |
| intercept | T | Yes | Learned intercept value |
| verbose | bool | Yes | Whether to enable verbose logging |
Outputs
| Name | Type | Description |
|---|---|---|
| coef (fit) | T* | Device array of learned regression coefficients [n_cols] |
| intercept (fit) | T* | Scalar intercept value |
| return value (fit) | int | Number of iterations the solver ran |
| preds (predict) | MLCommon::Matrix::Data<T>** | Array of prediction partitions across ranks |
Usage Examples
#include <cuml/solvers/cd_mg.hpp>
raft::handle_t handle;
// Distributed input data and labels
std::vector<MLCommon::Matrix::Data<float>*> input_data;
std::vector<MLCommon::Matrix::Data<float>*> labels;
MLCommon::Matrix::PartDescriptor desc;
// ... populate partitions ...
int n_cols = 100;
float* d_coef; // device array [n_cols]
float intercept;
// Fit elastic-net model with alpha=1.0, l1_ratio=0.5
int n_iter = ML::CD::opg::fit(handle, input_data, desc, labels,
d_coef, &intercept,
true, // fit_intercept
1000, // epochs
1.0f, // alpha
0.5f, // l1_ratio
true, // shuffle
1e-4f, // tol
false); // verbose