Implementation:Rapidsai Cuml Make Regression

Knowledge Sources	Rapidsai_Cuml
Domains	Machine_Learning, Synthetic_Data_Generation
Last Updated	2026-02-08 12:00 GMT

Overview

Generates synthetic regression datasets on the GPU, equivalent to scikit-learn's sklearn.datasets.make_regression.

Description

The ML::Datasets::make_regression function creates a synthetic regression problem with configurable numbers of samples, features, informative features, and targets. It outputs a row-major feature matrix and corresponding target values, and optionally writes the ground-truth coefficients used to generate the data. Additional controls include a bias term, effective rank for introducing correlations, tail strength for singular value profiles, Gaussian noise, shuffling, and a random seed for reproducibility.

Four overloads are provided covering combinations of single/double precision and int/int64_t index types, enabling flexible integration with various downstream consumers.

Usage

Use this function to generate synthetic regression data on the GPU for testing linear models (OLS, Ridge, Lasso), benchmarking solvers, or prototyping regression pipelines. It provides a fast GPU-native alternative to scikit-learn's make_regression.

Code Reference

Source Location

Repository: Rapidsai_Cuml
File: cpp/include/cuml/datasets/make_regression.hpp

Signature

namespace ML {
namespace Datasets {

void make_regression(const raft::handle_t& handle,
                     float* out,
                     float* values,
                     int64_t n_rows,
                     int64_t n_cols,
                     int64_t n_informative,
                     float* coef            = nullptr,
                     int64_t n_targets      = 1LL,
                     float bias             = 0.0f,
                     int64_t effective_rank = -1LL,
                     float tail_strength    = 0.5f,
                     float noise            = 0.0f,
                     bool shuffle           = true,
                     uint64_t seed          = 0ULL);

void make_regression(const raft::handle_t& handle,
                     double* out,
                     double* values,
                     int64_t n_rows,
                     int64_t n_cols,
                     int64_t n_informative,
                     double* coef           = nullptr,
                     int64_t n_targets      = 1LL,
                     double bias            = 0.0,
                     int64_t effective_rank = -1LL,
                     double tail_strength   = 0.5,
                     double noise           = 0.0,
                     bool shuffle           = true,
                     uint64_t seed          = 0ULL);

void make_regression(const raft::handle_t& handle,
                     float* out,
                     float* values,
                     int n_rows,
                     int n_cols,
                     int n_informative,
                     float* coef         = nullptr,
                     int n_targets       = 1,
                     float bias          = 0.0f,
                     int effective_rank  = -1,
                     float tail_strength = 0.5f,
                     float noise         = 0.0f,
                     bool shuffle        = true,
                     uint64_t seed       = 0ULL);

void make_regression(const raft::handle_t& handle,
                     double* out,
                     double* values,
                     int n_rows,
                     int n_cols,
                     int n_informative,
                     double* coef         = nullptr,
                     int n_targets        = 1,
                     double bias          = 0.0,
                     int effective_rank   = -1,
                     double tail_strength = 0.5,
                     double noise         = 0.0,
                     bool shuffle         = true,
                     uint64_t seed        = 0ULL);

}  // namespace Datasets
}  // namespace ML

Import

#include <cuml/datasets/make_regression.hpp>

I/O Contract

Inputs

Name	Type	Required	Description
handle	const raft::handle_t&	Yes	cuML handle for GPU resource management
n_rows	int64_t / int	Yes	Number of samples to generate
n_cols	int64_t / int	Yes	Number of features per sample
n_informative	int64_t / int	Yes	Number of informative features (non-zero coefficients)
coef	float/double	No (default nullptr)	Device pointer to store ground-truth coefficients [n_cols x n_targets]; nullptr to skip
n_targets	int64_t / int	No (default 1)	Number of target values per sample
bias	float/double	No (default 0.0)	Scalar bias added to the generated target values
effective_rank	int64_t / int	No (default -1)	Approximate rank of the data matrix for correlation; -1 for well-conditioned data
tail_strength	float/double	No (default 0.5)	Relative importance of the fat noisy tail in singular values when effective_rank != -1
noise	float/double	No (default 0.0)	Standard deviation of Gaussian noise added to targets
shuffle	bool	No (default true)	Whether to shuffle the samples and features
seed	uint64_t	No (default 0)	Seed for the random number generator

Outputs

Name	Type	Description
out	float/double	Device pointer to the generated feature matrix [n_rows x n_cols], row-major
values	float/double	Device pointer to the generated target matrix [n_rows x n_targets], row-major
coef (optional)	float/double	Device pointer to the coefficients [n_cols x n_targets], row-major (if non-null)

Usage Examples

#include <cuml/datasets/make_regression.hpp>
#include <raft/core/handle.hpp>

void generate_regression_data() {
    raft::handle_t handle;

    int64_t n_rows = 500;
    int64_t n_cols = 20;
    int64_t n_informative = 10;

    // Allocate device memory
    float* X;
    float* y;
    float* coef;
    cudaMalloc(&X, n_rows * n_cols * sizeof(float));
    cudaMalloc(&y, n_rows * sizeof(float));
    cudaMalloc(&coef, n_cols * sizeof(float));

    // Generate a regression dataset with 10 informative features
    ML::Datasets::make_regression(handle, X, y,
                                  n_rows, n_cols, n_informative,
                                  coef,        // store ground-truth coefficients
                                  1LL,         // n_targets
                                  0.0f,        // bias
                                  -1LL,        // effective_rank (well-conditioned)
                                  0.5f,        // tail_strength
                                  0.1f,        // noise std-dev
                                  true,        // shuffle
                                  42ULL);      // seed

    handle.sync_stream();

    // Use X, y, coef for regression model training/testing...

    cudaFree(X);
    cudaFree(y);
    cudaFree(coef);
}

Related Pages

Environment:Rapidsai_Cuml_CUDA_GPU

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment