Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Rapidsai Cuml Breast Cancer Dataset

From Leeroopedia
Revision as of 16:26, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Rapidsai_Cuml_Breast_Cancer_Dataset.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Machine_Learning, Datasets
Last Updated 2026-02-08 12:00 GMT

Overview

A hardcoded C++ header containing the Wisconsin Breast Cancer diagnostic dataset (569 samples, 30 features) as static constant vectors for use in cuML unit tests and benchmarks.

Description

breast_cancer.h provides the Wisconsin Diagnostic Breast Cancer dataset embedded directly as compile-time constant data. The dataset contains 569 observations of 30 numeric features computed from digitized images of fine needle aspirates of breast masses. Features describe characteristics of the cell nuclei including radius, texture, perimeter, area, smoothness, compactness, concavity, symmetry, and fractal dimension (mean, standard error, and worst for each).

The data is stored in two std::vector<float> constants within the MLCommon::Datasets::BreastCancer namespace:

  • breast_cancer -- A flattened vector of shape 569 x 30 containing the feature matrix in row-major order.
  • breast_cancer_y -- A vector of 569 binary target values (0 = malignant, 1 = benign).

Two additional constants provide the dataset dimensions:

  • n_samples = 569
  • n_features = 30

Usage

Use this dataset for unit testing binary classification algorithms (e.g., SVM, logistic regression, random forest classifiers) and for quick benchmarking where a well-known classification dataset is needed without file dependencies.

Code Reference

Source Location

  • Repository: Rapidsai_Cuml
  • File: cpp/src_prims/datasets/breast_cancer.h

Signature

namespace MLCommon {
namespace Datasets {
namespace BreastCancer {

const std::vector<float> breast_cancer = { /* 569 * 30 = 17070 float values */ };
const std::vector<float> breast_cancer_y = { /* 569 float values */ };

static const int n_samples  = 569;
static const int n_features = 30;

} // namespace BreastCancer
} // namespace Datasets
} // namespace MLCommon

Import

#include <datasets/breast_cancer.h>

I/O Contract

Inputs

Name Type Required Description
(none) -- -- This is a static data header with no runtime inputs.

Outputs

Name Type Description
breast_cancer const std::vector<float> Flattened feature matrix of shape (569, 30)
breast_cancer_y const std::vector<float> Binary target vector of length 569 (0 or 1)
n_samples int Number of samples (569)
n_features int Number of features (30)

Usage Examples

#include <datasets/breast_cancer.h>

// Access the dataset
const auto& X = MLCommon::Datasets::BreastCancer::breast_cancer;
const auto& y = MLCommon::Datasets::BreastCancer::breast_cancer_y;
int n = MLCommon::Datasets::BreastCancer::n_samples;   // 569
int p = MLCommon::Datasets::BreastCancer::n_features;   // 30

// Copy to device memory for SVM training
rmm::device_uvector<float> d_X(n * p, stream);
rmm::device_uvector<float> d_y(n, stream);
raft::update_device(d_X.data(), X.data(), n * p, stream);
raft::update_device(d_y.data(), y.data(), n, stream);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment