Implementation:Rapidsai Cuml Make Regression
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Synthetic_Data_Generation |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
Generates synthetic regression datasets on the GPU, equivalent to scikit-learn's sklearn.datasets.make_regression.
Description
The ML::Datasets::make_regression function creates a synthetic regression problem with configurable numbers of samples, features, informative features, and targets. It outputs a row-major feature matrix and corresponding target values, and optionally writes the ground-truth coefficients used to generate the data. Additional controls include a bias term, effective rank for introducing correlations, tail strength for singular value profiles, Gaussian noise, shuffling, and a random seed for reproducibility.
Four overloads are provided covering combinations of single/double precision and int/int64_t index types, enabling flexible integration with various downstream consumers.
Usage
Use this function to generate synthetic regression data on the GPU for testing linear models (OLS, Ridge, Lasso), benchmarking solvers, or prototyping regression pipelines. It provides a fast GPU-native alternative to scikit-learn's make_regression.
Code Reference
Source Location
- Repository: Rapidsai_Cuml
- File:
cpp/include/cuml/datasets/make_regression.hpp
Signature
namespace ML {
namespace Datasets {
void make_regression(const raft::handle_t& handle,
float* out,
float* values,
int64_t n_rows,
int64_t n_cols,
int64_t n_informative,
float* coef = nullptr,
int64_t n_targets = 1LL,
float bias = 0.0f,
int64_t effective_rank = -1LL,
float tail_strength = 0.5f,
float noise = 0.0f,
bool shuffle = true,
uint64_t seed = 0ULL);
void make_regression(const raft::handle_t& handle,
double* out,
double* values,
int64_t n_rows,
int64_t n_cols,
int64_t n_informative,
double* coef = nullptr,
int64_t n_targets = 1LL,
double bias = 0.0,
int64_t effective_rank = -1LL,
double tail_strength = 0.5,
double noise = 0.0,
bool shuffle = true,
uint64_t seed = 0ULL);
void make_regression(const raft::handle_t& handle,
float* out,
float* values,
int n_rows,
int n_cols,
int n_informative,
float* coef = nullptr,
int n_targets = 1,
float bias = 0.0f,
int effective_rank = -1,
float tail_strength = 0.5f,
float noise = 0.0f,
bool shuffle = true,
uint64_t seed = 0ULL);
void make_regression(const raft::handle_t& handle,
double* out,
double* values,
int n_rows,
int n_cols,
int n_informative,
double* coef = nullptr,
int n_targets = 1,
double bias = 0.0,
int effective_rank = -1,
double tail_strength = 0.5,
double noise = 0.0,
bool shuffle = true,
uint64_t seed = 0ULL);
} // namespace Datasets
} // namespace ML
Import
#include <cuml/datasets/make_regression.hpp>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| handle | const raft::handle_t& | Yes | cuML handle for GPU resource management |
| n_rows | int64_t / int | Yes | Number of samples to generate |
| n_cols | int64_t / int | Yes | Number of features per sample |
| n_informative | int64_t / int | Yes | Number of informative features (non-zero coefficients) |
| coef | float*/double* | No (default nullptr) | Device pointer to store ground-truth coefficients [n_cols x n_targets]; nullptr to skip |
| n_targets | int64_t / int | No (default 1) | Number of target values per sample |
| bias | float/double | No (default 0.0) | Scalar bias added to the generated target values |
| effective_rank | int64_t / int | No (default -1) | Approximate rank of the data matrix for correlation; -1 for well-conditioned data |
| tail_strength | float/double | No (default 0.5) | Relative importance of the fat noisy tail in singular values when effective_rank != -1 |
| noise | float/double | No (default 0.0) | Standard deviation of Gaussian noise added to targets |
| shuffle | bool | No (default true) | Whether to shuffle the samples and features |
| seed | uint64_t | No (default 0) | Seed for the random number generator |
Outputs
| Name | Type | Description |
|---|---|---|
| out | float*/double* | Device pointer to the generated feature matrix [n_rows x n_cols], row-major |
| values | float*/double* | Device pointer to the generated target matrix [n_rows x n_targets], row-major |
| coef (optional) | float*/double* | Device pointer to the coefficients [n_cols x n_targets], row-major (if non-null) |
Usage Examples
#include <cuml/datasets/make_regression.hpp>
#include <raft/core/handle.hpp>
void generate_regression_data() {
raft::handle_t handle;
int64_t n_rows = 500;
int64_t n_cols = 20;
int64_t n_informative = 10;
// Allocate device memory
float* X;
float* y;
float* coef;
cudaMalloc(&X, n_rows * n_cols * sizeof(float));
cudaMalloc(&y, n_rows * sizeof(float));
cudaMalloc(&coef, n_cols * sizeof(float));
// Generate a regression dataset with 10 informative features
ML::Datasets::make_regression(handle, X, y,
n_rows, n_cols, n_informative,
coef, // store ground-truth coefficients
1LL, // n_targets
0.0f, // bias
-1LL, // effective_rank (well-conditioned)
0.5f, // tail_strength
0.1f, // noise std-dev
true, // shuffle
42ULL); // seed
handle.sync_stream();
// Use X, y, coef for regression model training/testing...
cudaFree(X);
cudaFree(y);
cudaFree(coef);
}