Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Rapidsai Cuml Boston Dataset

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Datasets
Last Updated 2026-02-08 12:00 GMT

Overview

A hardcoded C++ header containing the Boston Housing dataset (506 samples, 13 features) as static constant vectors for use in cuML unit tests and benchmarks.

Description

boston.h provides the classic Boston Housing dataset embedded directly as compile-time constant data. This avoids the need for file I/O or network access during testing and benchmarking. The dataset contains 506 observations of 13 housing-related features (crime rate, average rooms, property tax rate, etc.) and a continuous target variable (median home value in $1000s).

The data is stored in two std::vector<float> constants within the MLCommon::Datasets::Boston namespace:

  • boston -- A flattened vector of shape 506 x 13 containing the feature matrix in row-major order.
  • boston_y -- A vector of 506 target values (median home prices).

Two additional constants provide the dataset dimensions:

  • n_samples = 506
  • n_features = 13

Usage

Use this dataset for unit testing regression algorithms (e.g., linear regression, ridge regression, decision tree regressors) and for quick benchmarking where a well-known dataset is needed without file dependencies.

Code Reference

Source Location

Signature

namespace MLCommon {
namespace Datasets {
namespace Boston {

const std::vector<float> boston = { /* 506 * 13 = 6578 float values */ };
const std::vector<float> boston_y = { /* 506 float values */ };

static const int n_samples  = 506;
static const int n_features = 13;

} // namespace Boston
} // namespace Datasets
} // namespace MLCommon

Import

#include <datasets/boston.h>

I/O Contract

Inputs

Name Type Required Description
(none) -- -- This is a static data header with no runtime inputs.

Outputs

Name Type Description
boston const std::vector<float> Flattened feature matrix of shape (506, 13)
boston_y const std::vector<float> Target vector of length 506
n_samples int Number of samples (506)
n_features int Number of features (13)

Usage Examples

#include <datasets/boston.h>

// Access the dataset
const auto& X = MLCommon::Datasets::Boston::boston;
const auto& y = MLCommon::Datasets::Boston::boston_y;
int n = MLCommon::Datasets::Boston::n_samples;   // 506
int p = MLCommon::Datasets::Boston::n_features;   // 13

// Copy to device memory
rmm::device_uvector<float> d_X(n * p, stream);
rmm::device_uvector<float> d_y(n, stream);
raft::update_device(d_X.data(), X.data(), n * p, stream);
raft::update_device(d_y.data(), y.data(), n, stream);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment