Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Rapidsai Cuml Make Blobs

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Synthetic_Data_Generation
Last Updated 2026-02-08 12:00 GMT

Overview

Generates synthetic clustered datasets on the GPU, equivalent to scikit-learn's sklearn.datasets.make_blobs.

Description

The ML::Datasets::make_blobs function creates isotropic Gaussian blobs for clustering benchmarks and testing. It generates a feature matrix and corresponding label array on-device. Callers can control the number of samples, features, and clusters, optionally providing pre-defined cluster centers and per-cluster standard deviations. The function supports both row-major and column-major output layouts, shuffling of data and labels, and configurable bounding boxes for randomly generated cluster centers.

Four overloads are provided covering all combinations of single/double precision and int/int64_t label types, allowing flexible integration with downstream code that may use different index types.

Usage

Use this function to generate synthetic clustered data on the GPU for testing clustering algorithms (K-Means, DBSCAN, etc.), benchmarking, or prototyping. It provides a fast GPU-native alternative to scikit-learn's make_blobs for CUDA-based workflows.

Code Reference

Source Location

  • Repository: Rapidsai_Cuml
  • File: cpp/include/cuml/datasets/make_blobs.hpp

Signature

namespace ML {
namespace Datasets {

void make_blobs(const raft::handle_t& handle,
                float* out,
                int64_t* labels,
                int64_t n_rows,
                int64_t n_cols,
                int64_t n_clusters,
                bool row_major                 = true,
                const float* centers           = nullptr,
                const float* cluster_std       = nullptr,
                const float cluster_std_scalar = 1.f,
                bool shuffle                   = true,
                float center_box_min           = -10.f,
                float center_box_max           = 10.f,
                uint64_t seed                  = 0ULL);

void make_blobs(const raft::handle_t& handle,
                double* out,
                int64_t* labels,
                int64_t n_rows,
                int64_t n_cols,
                int64_t n_clusters,
                bool row_major                  = true,
                const double* centers           = nullptr,
                const double* cluster_std       = nullptr,
                const double cluster_std_scalar = 1.0,
                bool shuffle                    = true,
                double center_box_min           = -10.0,
                double center_box_max           = 10.0,
                uint64_t seed                   = 0ULL);

void make_blobs(const raft::handle_t& handle,
                float* out,
                int* labels,
                int n_rows,
                int n_cols,
                int n_clusters,
                bool row_major                 = true,
                const float* centers           = nullptr,
                const float* cluster_std       = nullptr,
                const float cluster_std_scalar = 1.f,
                bool shuffle                   = true,
                float center_box_min           = -10.f,
                float center_box_max           = 10.0,
                uint64_t seed                  = 0ULL);

void make_blobs(const raft::handle_t& handle,
                double* out,
                int* labels,
                int n_rows,
                int n_cols,
                int n_clusters,
                bool row_major                  = true,
                const double* centers           = nullptr,
                const double* cluster_std       = nullptr,
                const double cluster_std_scalar = 1.0,
                bool shuffle                    = true,
                double center_box_min           = -10.0,
                double center_box_max           = 10.0,
                uint64_t seed                   = 0ULL);

}  // namespace Datasets
}  // namespace ML

Import

#include <cuml/datasets/make_blobs.hpp>

I/O Contract

Inputs

Name Type Required Description
handle const raft::handle_t& Yes cuML handle for GPU resource management
n_rows int64_t / int Yes Number of data samples to generate
n_cols int64_t / int Yes Number of features per sample
n_clusters int64_t / int Yes Number of clusters (classes) to generate
row_major bool No (default true) Whether output is stored in row-major layout
centers const float*/double* No (default nullptr) Pre-defined cluster centers on device [n_clusters x n_cols]; nullptr for random generation
cluster_std const float*/double* No (default nullptr) Per-cluster standard deviations on device [n_clusters]; nullptr to use cluster_std_scalar
cluster_std_scalar float/double No (default 1.0) Uniform standard deviation for all clusters (used when cluster_std is nullptr)
shuffle bool No (default true) Whether to shuffle the generated data and labels
center_box_min float/double No (default -10.0) Minimum value for randomly generated cluster centers
center_box_max float/double No (default 10.0) Maximum value for randomly generated cluster centers
seed uint64_t No (default 0) Seed for the random number generator

Outputs

Name Type Description
out float*/double* Device pointer to the generated feature matrix [n_rows x n_cols]
labels int64_t*/int* Device pointer to the generated label vector [n_rows]

Usage Examples

#include <cuml/datasets/make_blobs.hpp>
#include <raft/core/handle.hpp>

void generate_clustering_data() {
    raft::handle_t handle;

    int64_t n_rows = 1000;
    int64_t n_cols = 2;
    int64_t n_clusters = 5;

    // Allocate device memory
    float* data;
    int64_t* labels;
    cudaMalloc(&data, n_rows * n_cols * sizeof(float));
    cudaMalloc(&labels, n_rows * sizeof(int64_t));

    // Generate 5-cluster blob dataset
    ML::Datasets::make_blobs(handle, data, labels,
                             n_rows, n_cols, n_clusters,
                             true,      // row_major
                             nullptr,   // centers (random)
                             nullptr,   // cluster_std (use scalar)
                             1.0f,      // cluster_std_scalar
                             true,      // shuffle
                             -10.0f,    // center_box_min
                             10.0f,     // center_box_max
                             42ULL);    // seed

    handle.sync_stream();

    // Use data and labels for clustering experiments...

    cudaFree(data);
    cudaFree(labels);
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment