Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Rapidsai Cuml Diabetes Dataset

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Datasets
Last Updated 2026-02-08 12:00 GMT

Overview

A hardcoded C++ header containing the Diabetes regression dataset (442 samples, 10 features) as static constant vectors for use in cuML unit tests and benchmarks.

Description

diabetes.h provides the diabetes dataset (originally from Efron et al., 2004) embedded directly as compile-time constant data. The dataset contains 442 observations of 10 baseline variables (age, sex, body mass index, average blood pressure, and six blood serum measurements), each of which has been mean-centered and scaled to unit variance. The target is a quantitative measure of disease progression one year after baseline.

The data is stored in two std::vector<float> constants within the MLCommon::Datasets::Diabetes namespace:

  • diabetes -- A flattened vector of shape 442 x 10 containing the feature matrix in row-major order. Values are already standardized (zero mean, unit variance).
  • diabetes_y -- A vector of 442 continuous target values representing disease progression.

Two additional constants provide the dataset dimensions:

  • n_samples = 442
  • n_features = 10

Usage

Use this dataset for unit testing regression algorithms (e.g., ridge regression, LASSO, elastic net) and for benchmarking where a pre-standardized regression dataset is needed without file dependencies.

Code Reference

Source Location

  • Repository: Rapidsai_Cuml
  • File: cpp/src_prims/datasets/diabetes.h

Signature

namespace MLCommon {
namespace Datasets {
namespace Diabetes {

const std::vector<float> diabetes = { /* 442 * 10 = 4420 float values */ };
const std::vector<float> diabetes_y = { /* 442 float values */ };

static const int n_samples  = 442;
static const int n_features = 10;

} // namespace Diabetes
} // namespace Datasets
} // namespace MLCommon

Import

#include <datasets/diabetes.h>

I/O Contract

Inputs

Name Type Required Description
(none) -- -- This is a static data header with no runtime inputs.

Outputs

Name Type Description
diabetes const std::vector<float> Flattened feature matrix of shape (442, 10), pre-standardized
diabetes_y const std::vector<float> Continuous target vector of length 442
n_samples int Number of samples (442)
n_features int Number of features (10)

Usage Examples

#include <datasets/diabetes.h>

// Access the dataset
const auto& X = MLCommon::Datasets::Diabetes::diabetes;
const auto& y = MLCommon::Datasets::Diabetes::diabetes_y;
int n = MLCommon::Datasets::Diabetes::n_samples;   // 442
int p = MLCommon::Datasets::Diabetes::n_features;   // 10

// Copy to device memory for regression
rmm::device_uvector<float> d_X(n * p, stream);
rmm::device_uvector<float> d_y(n, stream);
raft::update_device(d_X.data(), X.data(), n * p, stream);
raft::update_device(d_y.data(), y.data(), n, stream);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment