Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Rapidsai Cuml KMeans DBSCAN HDBSCAN Init

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Clustering
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for configuring KMeans, DBSCAN, and HDBSCAN GPU-accelerated clustering algorithms via their constructor parameters.

Description

These constructors initialize the three primary clustering estimators in cuML:

  • KMeans.__init__ configures the number of clusters, initialization method (scalable-k-means++, random), convergence tolerance, and batch processing parameters.
  • DBSCAN.__init__ configures the neighborhood distance epsilon, minimum samples for core points, distance metric, and memory batch budget.
  • HDBSCAN.__init__ configures the minimum cluster size, cluster selection method (EOM vs leaf), KNN build algorithm, and prediction data generation.

Usage

Import and instantiate these classes to create clustering estimators. Configure hyperparameters based on the dataset size, expected cluster count, and noise characteristics.

Code Reference

KMeans.__init__

Source Location

  • Repository: cuML
  • File: python/cuml/cuml/cluster/kmeans.pyx
  • Lines: 482-504

Signature

def __init__(
    self,
    *,
    n_clusters=8,
    max_iter=300,
    tol=1e-4,
    verbose=False,
    random_state=None,
    init='scalable-k-means++',
    n_init='auto',
    oversampling_factor=2.0,
    max_samples_per_batch=32768,
    output_type=None,
):

Import

from cuml import KMeans
# or
from cuml.cluster import KMeans

DBSCAN.__init__

Source Location

  • Repository: cuML
  • File: python/cuml/cuml/cluster/dbscan.pyx
  • Lines: 281-299

Signature

def __init__(
    self,
    *,
    eps=0.5,
    min_samples=5,
    metric='euclidean',
    algorithm='brute',
    verbose=False,
    max_mbytes_per_batch=None,
    output_type=None,
    calc_core_sample_indices=True,
):

Import

from cuml import DBSCAN
# or
from cuml.cluster import DBSCAN

HDBSCAN.__init__

Source Location

  • Repository: cuML
  • File: python/cuml/cuml/cluster/hdbscan/hdbscan.pyx
  • Lines: 802-836

Signature

def __init__(
    self,
    *,
    min_cluster_size=5,
    min_samples=None,
    cluster_selection_epsilon=0.0,
    max_cluster_size=0,
    metric='euclidean',
    alpha=1.0,
    p=None,
    cluster_selection_method='eom',
    allow_single_cluster=False,
    gen_min_span_tree=False,
    verbose=False,
    output_type=None,
    prediction_data=False,
    build_algo='brute_force',
    build_kwds=None,
    device_ids=None,
):

Import

from cuml import HDBSCAN
# or
from cuml.cluster import HDBSCAN

I/O Contract

KMeans Inputs

Name Type Required Description
n_clusters int No (default 8) Number of clusters to form.
max_iter int No (default 300) Maximum Lloyd iterations.
tol float No (default 1e-4) Convergence threshold on center shift.
init str No (default 'scalable-k-means++') Initialization: 'scalable-k-means++', 'k-means ', 'k-means++', or 'random'.
n_init int or str No (default 'auto') Number of initializations to run.
oversampling_factor float No (default 2.0) Factor for scalable k-means++ oversampling.
max_samples_per_batch int No (default 32768) Samples per distance computation batch.

DBSCAN Inputs

Name Type Required Description
eps float No (default 0.5) Maximum neighborhood distance for core point calculation.
min_samples int No (default 5) Minimum number of neighbors for a core sample.
metric str No (default 'euclidean') Distance metric: 'euclidean', 'cosine', or 'precomputed'.
max_mbytes_per_batch float or None No (default None) Memory budget in MB per batch for distance computation. None uses all available GPU memory.

HDBSCAN Inputs

Name Type Required Description
min_cluster_size int No (default 5) Minimum number of samples in a cluster.
cluster_selection_method str No (default 'eom') Cluster extraction: 'eom' (Excess of Mass) or 'leaf'.
build_algo str No (default 'brute_force') KNN graph construction: 'brute_force' or 'nn_descent'.
prediction_data bool No (default False) If True, caches data needed for approximate_predict.

Outputs

Name Type Description
KMeans instance KMeans Configured KMeans estimator ready for fitting.
DBSCAN instance DBSCAN Configured DBSCAN estimator ready for fitting.
HDBSCAN instance HDBSCAN Configured HDBSCAN estimator ready for fitting.

Usage Examples

from cuml.cluster import KMeans, DBSCAN, HDBSCAN

# KMeans for known cluster count
kmeans = KMeans(n_clusters=5, max_iter=500, init='scalable-k-means++')

# DBSCAN for density-based discovery
dbscan = DBSCAN(eps=0.3, min_samples=10, metric='euclidean')

# HDBSCAN for variable-density clusters
hdbscan = HDBSCAN(min_cluster_size=15, cluster_selection_method='eom', prediction_data=True)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment