Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Rapidsai Cuml Make Regression

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Synthetic_Data_Generation
Last Updated 2026-02-08 12:00 GMT

Overview

Generates synthetic regression datasets on the GPU, equivalent to scikit-learn's sklearn.datasets.make_regression.

Description

The ML::Datasets::make_regression function creates a synthetic regression problem with configurable numbers of samples, features, informative features, and targets. It outputs a row-major feature matrix and corresponding target values, and optionally writes the ground-truth coefficients used to generate the data. Additional controls include a bias term, effective rank for introducing correlations, tail strength for singular value profiles, Gaussian noise, shuffling, and a random seed for reproducibility.

Four overloads are provided covering combinations of single/double precision and int/int64_t index types, enabling flexible integration with various downstream consumers.

Usage

Use this function to generate synthetic regression data on the GPU for testing linear models (OLS, Ridge, Lasso), benchmarking solvers, or prototyping regression pipelines. It provides a fast GPU-native alternative to scikit-learn's make_regression.

Code Reference

Source Location

  • Repository: Rapidsai_Cuml
  • File: cpp/include/cuml/datasets/make_regression.hpp

Signature

namespace ML {
namespace Datasets {

void make_regression(const raft::handle_t& handle,
                     float* out,
                     float* values,
                     int64_t n_rows,
                     int64_t n_cols,
                     int64_t n_informative,
                     float* coef            = nullptr,
                     int64_t n_targets      = 1LL,
                     float bias             = 0.0f,
                     int64_t effective_rank = -1LL,
                     float tail_strength    = 0.5f,
                     float noise            = 0.0f,
                     bool shuffle           = true,
                     uint64_t seed          = 0ULL);

void make_regression(const raft::handle_t& handle,
                     double* out,
                     double* values,
                     int64_t n_rows,
                     int64_t n_cols,
                     int64_t n_informative,
                     double* coef           = nullptr,
                     int64_t n_targets      = 1LL,
                     double bias            = 0.0,
                     int64_t effective_rank = -1LL,
                     double tail_strength   = 0.5,
                     double noise           = 0.0,
                     bool shuffle           = true,
                     uint64_t seed          = 0ULL);

void make_regression(const raft::handle_t& handle,
                     float* out,
                     float* values,
                     int n_rows,
                     int n_cols,
                     int n_informative,
                     float* coef         = nullptr,
                     int n_targets       = 1,
                     float bias          = 0.0f,
                     int effective_rank  = -1,
                     float tail_strength = 0.5f,
                     float noise         = 0.0f,
                     bool shuffle        = true,
                     uint64_t seed       = 0ULL);

void make_regression(const raft::handle_t& handle,
                     double* out,
                     double* values,
                     int n_rows,
                     int n_cols,
                     int n_informative,
                     double* coef         = nullptr,
                     int n_targets        = 1,
                     double bias          = 0.0,
                     int effective_rank   = -1,
                     double tail_strength = 0.5,
                     double noise         = 0.0,
                     bool shuffle         = true,
                     uint64_t seed        = 0ULL);

}  // namespace Datasets
}  // namespace ML

Import

#include <cuml/datasets/make_regression.hpp>

I/O Contract

Inputs

Name Type Required Description
handle const raft::handle_t& Yes cuML handle for GPU resource management
n_rows int64_t / int Yes Number of samples to generate
n_cols int64_t / int Yes Number of features per sample
n_informative int64_t / int Yes Number of informative features (non-zero coefficients)
coef float*/double* No (default nullptr) Device pointer to store ground-truth coefficients [n_cols x n_targets]; nullptr to skip
n_targets int64_t / int No (default 1) Number of target values per sample
bias float/double No (default 0.0) Scalar bias added to the generated target values
effective_rank int64_t / int No (default -1) Approximate rank of the data matrix for correlation; -1 for well-conditioned data
tail_strength float/double No (default 0.5) Relative importance of the fat noisy tail in singular values when effective_rank != -1
noise float/double No (default 0.0) Standard deviation of Gaussian noise added to targets
shuffle bool No (default true) Whether to shuffle the samples and features
seed uint64_t No (default 0) Seed for the random number generator

Outputs

Name Type Description
out float*/double* Device pointer to the generated feature matrix [n_rows x n_cols], row-major
values float*/double* Device pointer to the generated target matrix [n_rows x n_targets], row-major
coef (optional) float*/double* Device pointer to the coefficients [n_cols x n_targets], row-major (if non-null)

Usage Examples

#include <cuml/datasets/make_regression.hpp>
#include <raft/core/handle.hpp>

void generate_regression_data() {
    raft::handle_t handle;

    int64_t n_rows = 500;
    int64_t n_cols = 20;
    int64_t n_informative = 10;

    // Allocate device memory
    float* X;
    float* y;
    float* coef;
    cudaMalloc(&X, n_rows * n_cols * sizeof(float));
    cudaMalloc(&y, n_rows * sizeof(float));
    cudaMalloc(&coef, n_cols * sizeof(float));

    // Generate a regression dataset with 10 informative features
    ML::Datasets::make_regression(handle, X, y,
                                  n_rows, n_cols, n_informative,
                                  coef,        // store ground-truth coefficients
                                  1LL,         // n_targets
                                  0.0f,        // bias
                                  -1LL,        // effective_rank (well-conditioned)
                                  0.5f,        // tail_strength
                                  0.1f,        // noise std-dev
                                  true,        // shuffle
                                  42ULL);      // seed

    handle.sync_stream();

    // Use X, y, coef for regression model training/testing...

    cudaFree(X);
    cudaFree(y);
    cudaFree(coef);
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment