Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Rapidsai Cuml Digits Dataset

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Datasets
Last Updated 2026-02-08 12:00 GMT

Overview

A hardcoded C++ header containing the Digits handwritten digit recognition dataset (1797 samples, 64 features) as static constant vectors for use in cuML unit tests and benchmarks.

Description

digits.h provides a copy of the scikit-learn Digits dataset (based on the UCI ML hand-written digits dataset) embedded directly as compile-time constant data. The dataset contains 1797 samples of 8x8 pixel images of handwritten digits (0-9), where each pixel intensity is represented as a float value. Each sample is flattened into a 64-dimensional feature vector.

The data is stored in two std::vector<float> constants within the MLCommon::Datasets::Digits namespace:

  • digits -- A flattened vector of shape 1797 x 64 containing the feature matrix in row-major order. Pixel values range from 0.0 to 16.0.
  • digits_y -- A vector of 1797 integer target values (0 through 9) stored as floats.

Two additional constants provide the dataset dimensions:

  • n_samples = 1797
  • n_features = 64

Usage

Use this dataset for unit testing multi-class classification algorithms (e.g., SVM, k-NN, random forest), dimensionality reduction methods (e.g., PCA, t-SNE, UMAP), and clustering algorithms. The 64-dimensional feature space makes it suitable for testing algorithms that operate on moderate-dimensional data.

Code Reference

Source Location

Signature

namespace MLCommon {
namespace Datasets {
namespace Digits {

const std::vector<float> digits = { /* 1797 * 64 = 115008 float values */ };
const std::vector<float> digits_y = { /* 1797 float values */ };

static const int n_samples  = 1797;
static const int n_features = 64;

} // namespace Digits
} // namespace Datasets
} // namespace MLCommon

Import

#include <datasets/digits.h>

I/O Contract

Inputs

Name Type Required Description
(none) -- -- This is a static data header with no runtime inputs.

Outputs

Name Type Description
digits const std::vector<float> Flattened feature matrix of shape (1797, 64), pixel values 0-16
digits_y const std::vector<float> Multi-class target vector of length 1797 (values 0-9)
n_samples int Number of samples (1797)
n_features int Number of features (64)

Usage Examples

#include <datasets/digits.h>

// Access the dataset
const auto& X = MLCommon::Datasets::Digits::digits;
const auto& y = MLCommon::Datasets::Digits::digits_y;
int n = MLCommon::Datasets::Digits::n_samples;   // 1797
int p = MLCommon::Datasets::Digits::n_features;   // 64

// Copy to device memory for classification or dimensionality reduction
rmm::device_uvector<float> d_X(n * p, stream);
rmm::device_uvector<float> d_y(n, stream);
raft::update_device(d_X.data(), X.data(), n * p, stream);
raft::update_device(d_y.data(), y.data(), n, stream);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment